The Problem: Flying Blind Without RED Metrics

Running microservices without proper observability is like navigating a ship in thick fog. You might be moving, but you have no idea if you’re hitting icebergs until it’s too late. For gRPC services, tracking RED metrics (Rate, Errors, Duration) is paramount to understanding performance and stability. My goal was to add this crucial observability to a Golang gRPC microservice.

My first thought was to manually wrap every single gRPC method call with Prometheus metric increments. However, I quickly realized this approach was deeply flawed. It would introduce significant boilerplate, be incredibly tedious to maintain across numerous service methods, and worst of all, it would be highly prone to errors. Forgetting to defer a timer or an error increment could lead to silent monitoring failures, defeating the entire purpose. I needed a more elegant, global solution.

The Journey: Navigating Interceptors and Gotchas

I knew interceptors were the right tool for the job. gRPC interceptors allow you to intercept incoming (and outgoing) requests and perform actions globally, such as logging, authentication, or, in this case, metric collection. The grpc-ecosystem/go-grpc-middleware library immediately stood out as the standard choice for building gRPC middleware in Go.

However, navigating the go-grpc-middleware ecosystem had a few quirks. I quickly discovered that the older go-grpc-prometheus library (v1) is largely deprecated in favor of the newer providers packages within the go-grpc-middleware project itself, specifically go-grpc-middleware/providers/prometheus. This was an important distinction to avoid using outdated patterns.

A critical gotcha I encountered, and one that often trips up new users, is the fundamental difference between gRPC and Prometheus scraping. gRPC runs over HTTP/2, a binary protocol. Prometheus, on the other hand, expects to scrape metrics from a standard HTTP endpoint, typically /metrics, serving plain text. You cannot simply attach the Prometheus metrics to the gRPC server itself. This meant I needed to spin up a secondary, lightweight HTTP server specifically to expose the /metrics endpoint for Prometheus to scrape. This separate server would host the Prometheus registry, distinct from the gRPC server’s listener.

The Solution: Centralized Server-Side Monitoring

The clean and idiomatic solution involves using the go-grpc-middleware/providers/prometheus package. This allows me to implement comprehensive RED metric collection for all gRPC methods without touching the business logic of each service.

Here’s the breakdown of the server-side implementation:

  1. Setup Prometheus Metrics: I start by initializing prom.NewServerMetrics(). This object will hold all the necessary Prometheus collectors (counters, histograms) for gRPC server-side metrics. I configure it with a custom histogram bucket for handling times to get more granular performance insights. I then create a prometheus.NewRegistry() and register my server metrics with it. This custom registry is crucial for isolating the gRPC metrics from the default Prometheus registry, especially in more complex applications.

  2. Setup gRPC Server with Interceptors: I then create my grpc.NewServer(). The key here is to inject the srvMetrics.UnaryServerInterceptor() and srvMetrics.StreamServerInterceptor() using grpc.ChainUnaryInterceptor() and grpc.ChainStreamInterceptor(). This ensures that every unary and stream RPC call will automatically pass through my Prometheus interceptors, collecting metrics without any manual effort.

  3. Initialize Metrics for Services: After registering any actual gRPC service implementations (not shown in the minimal example but essential in a real application), I call srvMetrics.InitializeMetrics(grpcServer). This step is crucial because it ensures that Prometheus metrics are correctly initialized for all the gRPC methods exposed by the registered services, even if they haven’t been called yet.

  4. Start HTTP Server for Prometheus Scraper: As discussed, I spin up a standard Go HTTP server in a separate goroutine. This server listens on a distinct port (e.g., :9092) and exposes the /metrics endpoint. This endpoint uses promhttp.HandlerFor(reg, promhttp.HandlerOpts{}) to serve the metrics from the custom Prometheus registry I created earlier. This is the endpoint Prometheus will scrape.

  5. Start the gRPC Server: Finally, I start the main gRPC server, listening on its designated port (e.g., :50051).

Here’s the complete server-side setup:

package main

import (
	"log"
	"net"
	"net/http"

	prom "github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"google.golang.org/grpc"
)

func main() {
	// 1. Setup Prometheus metrics
	srvMetrics := prom.NewServerMetrics(
		prom.WithServerHandlingTimeHistogram(
			prom.WithHistogramBuckets([]float64{0.001, 0.01, 0.1, 0.3, 0.6, 1, 3, 6, 9, 20}),
		),
	)
	reg := prometheus.NewRegistry()
	reg.MustRegister(srvMetrics)

	// 2. Setup the gRPC Server with Interceptors
	grpcServer := grpc.NewServer(
		grpc.ChainUnaryInterceptor(srvMetrics.UnaryServerInterceptor()),
		grpc.ChainStreamInterceptor(srvMetrics.StreamServerInterceptor()),
	)

	// (Register your gRPC service implementations here, e.g.)
	// pb.RegisterYourServiceServer(grpcServer, &yourService{})

	// 3. Initialize metrics for all registered services
	srvMetrics.InitializeMetrics(grpcServer)

	// 4. Start the HTTP server for Prometheus scraping in a goroutine
	go func() {
		http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))
		log.Printf("Starting Prometheus metrics server on :9092")
		if err := http.ListenAndServe(":9092", nil); err != nil {
			log.Fatalf("Failed to start metrics server: %v", err)
		}
	}()

	// 5. Start the gRPC server
	lis, err := net.Listen("tcp", ":50051")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	log.Printf("Starting gRPC server on :50051")
	if err := grpcServer.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

Client-Side Observability

The client-side monitoring follows a very similar pattern. Instead of prom.NewServerMetrics(), you would use prom.NewClientMetrics(). These client metrics would then be injected into your grpc.Dial() call using grpc.WithChainUnaryInterceptor() and grpc.WithChainStreamInterceptor(). This provides a consistent way to track RED metrics from the perspective of the gRPC client, giving you full end-to-end visibility.

Summary

By leveraging go-grpc-middleware/providers/prometheus, I was able to implement robust and automated RED metric collection for my Golang gRPC services. This approach completely eliminates the need for manual instrumentation, significantly reduces boilerplate, and ensures consistent monitoring across all RPC methods. The crucial aspect of running a separate HTTP server for Prometheus scraping is a common pattern for good reason and ensures your metrics are always accessible. This setup provides critical insights into service performance and reliability, moving from flying blind to confidently navigating my microservice landscape.