Testing Prometheus Metrics In Integration Tests In Golang

Integration tests are the ones that test how our application integrate with external dependencies. To do so we usually use Docker containers that emulate those services.

An example of external service can be:

Database
Third-party API
Pub/Sub Service
Mailing Service
…

For that kind of tests I usually check the important outcome that the application gave. This can be for example an HTTP response, a message in a queue, an email…

I usually avoid checking metrics since that ones are meant to help but are not domain requirements, at least in the businesses I’ve worked so far.

Anyways it can happen that there is no other way to check the outcome of the flow we are testing than checking a registered metric. In the case I faced recently it was a background job that is used to check the integrity of the data matching with a third-party service.

This process is key for the business since it alerts the developers in case we forgot to process a transaction from the third-party service. Then checking the metrics became necessary.

In that case I needed to get the initial state of the metric and then to check if the metric got modified as expected. This allows us to not work with fixed values but with increments which is better to have more robust tests.

Get Prometheus values from a server

To get the current metric value of a counter I prepared this function:

import (
	"fmt"
	"io"
	"net/http"
	"regexp"
	"strconv"
	"testing"

	"github.com/stretchr/testify/require"
)

func GetPrometheusCounter(t *testing.T, serverURL, metricName, metricTags string) int {
	// we call the server to get the metrics
	resp, err := http.Get(serverURL + "/metrics")
	require.NoError(t, err)
	defer func() {
		require.NoError(t, resp.Body.Close())
	}()

	// we read all the content of the response body
	body, err := io.ReadAll(resp.Body)
	require.NoError(t, err)

	// we prepare a regex that will search for counters
	re := regexp.MustCompile(fmt.Sprintf(`%s{%s} (\d+)`, metricName, metricTags))

	// we run the regex
	matches := re.FindStringSubmatch(string(body))
	if len(matches) < 2 {
		t.Logf("could not find metric %s{%s}", metricName, metricTags)

		return 0
	}

	// we convert the string result to integer
	i, err := strconv.Atoi(matches[1])
	require.NoError(t, err)

	return i
}

With that we can easily check the current value:

irc := GetPrometheusCounter(t, serverURL, "myapp_requests_counter", "{app=myapp}")

// run the code we want to test here

crc := GetPrometheusCounter(t, serverURL, "myapp_requests_counter", "{app=myapp}")
require.Equal(t, 2, crc-irc)

Using prometheus/client_golang/prometheus/testutil

With this function we could already do a complete test as we’ve seen in the previous example.

While checking the how to do that I saw the package github.com/prometheus/client_golang/prometheus/testutil which contains some helpers to check Prometheus metrics. In our case testutil.ScrapeAndCompare looked very interesting since it scrapes the value from the server and compare with a reader.

To use it I prepared the following function:

import (
	"fmt"
	"strings"
	"testing"
	"time"

	"github.com/prometheus/client_golang/prometheus/testutil"
	"github.com/stretchr/testify/require"
)

func ExpectPrometheusCounter(t *testing.T, serverURL, metricName, metricTags string, expectedValue int) {
	// Eventuallyf allows us to retry for a limit amount of time
	// We'll retry for 1 second every 100 milliseconds
	require.Eventuallyf(t, func() bool {
		// this is the expected content
		expected := fmt.Sprintf(`
# TYPE %s counter
%s{%s} %d
`, metricName, metricName, metricTags, expectedValue)

		// ScrapeAndCompare will check the expected content with the current server metrics
		if err := testutil.ScrapeAndCompare(serverURL+"/metrics", strings.NewReader(expected), fullMetricName); err != nil {
			t.Log(err.Error())
			return false
		}

		return true
	}, time.Second, 100*time.Millisecond, "could not find metric %s with tags %s at value %d", metricName, metricTags, expectedValue)
}

Then the initial example would end up like this:

irc := GetPrometheusCounter(t, serverURL, "myapp_requests_counter", "{app=myapp}")

// run the code we want to test here

ExpectPrometheusCounter(t, serverURL, "myapp_requests_counter", "{app=myapp}", irc+2)

This option has the advantage that it retries for one second every 100 milliseconds.

Conclusion

In conclusion, while metrics may not always be considered primary business requirements, testing them in integration scenarios proves invaluable for maintaining the overall health and reliability of our applications.