Engineering Guide

The Definitive Guide to Structured Logging in 2024

Alex Chen

• October 24, 2024 • 12 min read

Unstructured logs are the silent killers of observability. Here is how to build a logging strategy that scales, survives outages, and gives you the signal you need.

Structured logging diagram showing key-value pairs

Why unstructured logs fail at scale

In the early days of a startup, logging is simple. You have one service, one database, and a single developer on call. You can afford to log strings like "User 404 not found at 14:02" and grep them when things break.

But as you scale to microservices, distributed systems, and multiple teams, that approach collapses. You end up with millions of lines of text where the fields are inconsistent, the timestamps are fuzzy, and the search queries are brittle.

The result is the "grep-and-pray" debugging cycle. You spend more time filtering noise than finding the signal. This isn't just an inconvenience; it's a liability. When a critical error occurs in production, you need to know who triggered it, where in the stack it happened, and what the state was at that exact millisecond. Unstructured logs cannot provide that.

What is structured logging?

Structured logging is the practice of logging data as a sequence of key-value pairs, typically in JSON format. Instead of a narrative sentence, you log events as data that can be parsed, indexed, and queried programmatically.

For example, an unstructured log might look like this:

unstructured.log

2024-10-24T14:02:15Z ERROR Database connection failed for user 9921

A structured log, however, looks like this:

structured.json

{
  "timestamp": "2024-10-24T14:02:15Z",
  "level": "error",
  "service": "auth-api",
  "user_id": 9921,
  "error_code": "DB_TIMEOUT",
  "stack_trace": "..."
}

Notice how the user_id is a number, not a string, and the error_code is a specific constant. This machine-readable format allows your observability platform to index every field instantly.

Choosing a log schema

Not all structured logs are created equal. The power of structured logging comes from convention. If every service in your ecosystem logs a trace_id, you can correlate events across services. If they don't, you're back to square one.

Standard Fields

Every log event should include these mandatory fields:

timestamp: ISO 8601 format (e.g., 2024-10-24T14:02:15Z).
level: debug, info, warn, error.
service: The name of the microservice emitting the log.
environment: dev, staging, prod.

Contextual Fields

For distributed tracing, include:

trace_id: A unique identifier for the request flow.
span_id: The specific operation within the trace.

Context propagation

The most common mistake developers make is forgetting to pass context. If a request starts in the API Gateway and is routed to a Worker service, the Worker service has no idea which user made the request or which trace it belongs to.

You must propagate context through your call stack. In Go, this is done via the context package. In other languages, use thread-local storage or dependency injection.

go/main.go

// Middleware extracts trace ID from header
func LoggingMiddleware(next http.Handler) http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    traceID := r.Header.Get("X-Trace-ID")
    // Inject into context
    ctx := context.WithValue(r.Context(), "trace_id", traceID)

    // Pass context to handler
    next.ServeHTTP(w, r.WithContext(ctx))
  })
}

go/service.go

func ProcessOrder(ctx context.Context, orderID int) {
  // Retrieve trace ID from context
  traceID := ctx.Value("trace_id").(string)

  log.Info("order.processed",
    "trace_id", traceID,
    "order_id", orderID,
    "status", "success",
)
}

Sampling strategies

As traffic grows, logging every single request becomes expensive and noisy. You need sampling to reduce volume without losing the ability to debug errors.

1. Random Sampling

Log 100% of errors, but only 10% of info-level logs. This is the simplest strategy and works well for most services.

2. Header-based Sampling

Allow downstream consumers (like a load balancer) to inject a header (e.g., X-Sample-Rate: 0.1) that the SDK respects.

3. Adaptive Sampling

If your error rate spikes, automatically increase sampling rate to capture more context. If it drops, decrease it to save costs.

Shipping logs to aggregators

Once your logs are structured, you need to get them to a central place. The choice of protocol and format depends on your infrastructure.

HTTP/JSON

The most common method. Simple to implement and works with any cloud provider. Ideal for short-lived containers.

gRPC

Faster and more efficient than HTTP for high-throughput scenarios. Requires a gRPC server on the aggregator side.

Protobuf

If you are shipping millions of logs per second, binary formats like Protobuf are significantly smaller than JSON, reducing bandwidth usage.

Stop guessing. Start logging.

LogKit provides the SDK and platform to make structured logging a breeze. No configuration, no pipelines, just clean, queryable data.

Start for free Read the docs

About the Author

Alex Chen is a Senior Backend Engineer at LogKit with over a decade of experience building distributed systems. He is the maintainer of several open-source Go libraries and is passionate about observability and developer tooling.

When he's not debugging production incidents, Alex writes about software architecture and the art of writing clean code.