The Definitive Guide to Structured Logging in 2024
Unstructured logs are the silent killers of observability. Here is how to build a logging strategy that scales, survives outages, and gives you the signal you need.
Why unstructured logs fail at scale
In the early days of a startup, logging is simple. You have one service, one database, and a single developer on call. You can afford to log strings like "User 404 not found at 14:02" and grep them when things break.
But as you scale to microservices, distributed systems, and multiple teams, that approach collapses. You end up with millions of lines of text where the fields are inconsistent, the timestamps are fuzzy, and the search queries are brittle.
The result is the "grep-and-pray" debugging cycle. You spend more time filtering noise than finding the signal. This isn't just an inconvenience; it's a liability. When a critical error occurs in production, you need to know who triggered it, where in the stack it happened, and what the state was at that exact millisecond. Unstructured logs cannot provide that.
What is structured logging?
Structured logging is the practice of logging data as a sequence of key-value pairs, typically in JSON format. Instead of a narrative sentence, you log events as data that can be parsed, indexed, and queried programmatically.
For example, an unstructured log might look like this:
A structured log, however, looks like this:
"timestamp": "2024-10-24T14:02:15Z",
"level": "error",
"service": "auth-api",
"user_id": 9921,
"error_code": "DB_TIMEOUT",
"stack_trace": "..."
}
Notice how the user_id is a number, not a string, and the error_code is a specific constant. This machine-readable format allows your observability platform to index every field instantly.
Choosing a log schema
Not all structured logs are created equal. The power of structured logging comes from convention. If every service in your ecosystem logs a trace_id, you can correlate events across services. If they don't, you're back to square one.
Standard Fields
Every log event should include these mandatory fields:
- timestamp: ISO 8601 format (e.g.,
2024-10-24T14:02:15Z). - level:
debug,info,warn,error. - service: The name of the microservice emitting the log.
- environment:
dev,staging,prod.
Contextual Fields
For distributed tracing, include:
- trace_id: A unique identifier for the request flow.
- span_id: The specific operation within the trace.
Context propagation
The most common mistake developers make is forgetting to pass context. If a request starts in the API Gateway and is routed to a Worker service, the Worker service has no idea which user made the request or which trace it belongs to.
You must propagate context through your call stack. In Go, this is done via the context package. In other languages, use thread-local storage or dependency injection.
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
traceID := r.Header.Get("X-Trace-ID")
// Inject into context
ctx := context.WithValue(r.Context(), "trace_id", traceID)
// Pass context to handler
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// Retrieve trace ID from context
traceID := ctx.Value("trace_id").(string)
log.Info("order.processed",
"trace_id", traceID,
"order_id", orderID,
"status", "success",
)
}
Sampling strategies
As traffic grows, logging every single request becomes expensive and noisy. You need sampling to reduce volume without losing the ability to debug errors.
1. Random Sampling
Log 100% of errors, but only 10% of info-level logs. This is the simplest strategy and works well for most services.
2. Header-based Sampling
Allow downstream consumers (like a load balancer) to inject a header (e.g., X-Sample-Rate: 0.1) that the SDK respects.
3. Adaptive Sampling
If your error rate spikes, automatically increase sampling rate to capture more context. If it drops, decrease it to save costs.
Shipping logs to aggregators
Once your logs are structured, you need to get them to a central place. The choice of protocol and format depends on your infrastructure.
HTTP/JSON
The most common method. Simple to implement and works with any cloud provider. Ideal for short-lived containers.
gRPC
Faster and more efficient than HTTP for high-throughput scenarios. Requires a gRPC server on the aggregator side.
Protobuf
If you are shipping millions of logs per second, binary formats like Protobuf are significantly smaller than JSON, reducing bandwidth usage.
Stop guessing. Start logging.
LogKit provides the SDK and platform to make structured logging a breeze. No configuration, no pipelines, just clean, queryable data.
About the Author
Alex Chen is a Senior Backend Engineer at LogKit with over a decade of experience building distributed systems. He is the maintainer of several open-source Go libraries and is passionate about observability and developer tooling.
When he's not debugging production incidents, Alex writes about software architecture and the art of writing clean code.
Related Articles
Understanding OpenTelemetry
A deep dive into the standard for observability signals.
Debugging Distributed Systems
Strategies for tracing requests across microservices.
The Cost of Logging
How to optimize storage costs without losing visibility.