A beginner-friendly guide to what OpenTelemetry is and why it matters. Covers observability, the three pillars, instrumentation, and exporters - no code required.
⏱ 11 min read
Production is down. The checkout flow is broken. Customers can't complete orders.
You open the logs. There are thousands of lines - INFO, DEBUG, ERROR messages from a dozen services - and none of them clearly say this is where it broke. You start guessing. Was it the payment service? The inventory API? The database connection pool? You SSH into servers, tail logs, grep for exceptions, and hope something jumps out.
Meanwhile, your on-call rotation is burning, your boss is messaging you, and every minute of downtime has a dollar value attached to it.
This is the problem observability solves. Not "do we have logs?" - you have plenty of those. The real question is: "can we actually understand what our system is doing from the outside?" That's a very different thing.
If you've never set up observability before - if words like "distributed tracing" or "metrics cardinality" make you glaze over - this post is for you. We'll start from first principles and use everyday analogies to make the concepts stick. The code comes in Part 2.
Observability is a term borrowed from control theory. A system is "observable" if you can determine its internal state by examining its external outputs. For software, the question becomes: can you figure out what is happening inside your application by looking at the data it produces?
The classic alternative is monitoring - you set up dashboards and alerts for problems you know might happen. CPU over 90%? Alert. Error rate over 1%? Alert. Monitoring only tells you about problems you anticipated. Observability lets you ask new questions about your running system, even for failures you never predicted.
A good analogy is a visit to the doctor.
When you feel unwell, the doctor doesn't open you up to look inside. Instead, they gather external signals:
OpenTelemetry is the industry-standard toolkit that lets your software produce all three signals in a consistent, vendor-neutral way. It is the second most active CNCF (Cloud Native Computing Foundation) project after Kubernetes - and that momentum is there for a reason.
OpenTelemetry (OTel) is a CNCF project born in 2019 from the merger of two earlier standards: OpenCensus (Google) and OpenTracing (community). The goal was to stop fragmenting the ecosystem. Before OTel, every vendor had their own SDK, their own agent, their own wire format. You'd instrument your app for Datadog, then decide to try Honeycomb, and find yourself rewriting everything.
OTel solves that by owning the instrumentation layer - the code that lives in your application and produces telemetry data. Where that data goes (Jaeger, Datadog, Azure Monitor, Honeycomb, Zipkin) is a separate concern, handled by swappable exporters. You change one line in Program.cs, not your entire codebase.
Think of it like a flight recorder - the black box on commercial aircraft. Every plane has one that records:
The black box standard defines exactly what gets recorded, how it's encoded, and where it goes - so any airline (your application) can connect to any air traffic control system (any observability backend) without rebuilding the recorder. OTel is that standard for software.
One distinction worth locking in early: OpenTelemetry is not Jaeger. It is not Datadog. It is not your observability backend. Those tools store and visualize telemetry. OTel produces and ships it. This confusion derails a lot of developers when they first encounter the space.
Instrumentation is the act of adding code to your application that captures what is happening and turns it into data - traces, metrics, and logs.
Think of a busy kitchen in a restaurant. Without instrumentation, you just know that food goes in and plates come out. With instrumentation, you have a timer on every dish, a counter for every order, and a note every time something is returned. You can now answer questions you couldn't before: which dish takes the longest? When does the kitchen slow down? Which orders triggered a complaint?
There are two kinds of instrumentation:
Automatic instrumentation is when a library does the work for you. Packages like OpenTelemetry.Instrumentation.AspNetCore trace every incoming HTTP request, and OpenTelemetry.Instrumentation.Http traces every outgoing HttpClient call - without you writing a single line for each one. The library knows where the interesting things happen, so it instruments them on your behalf.
Manual instrumentation (or custom instrumentation) is when you do it yourself - for the parts that only you understand. No framework knows what a "PlaceOrder" operation is, or that you care about customer.id and order.id. That domain knowledge only lives in your code. In .NET, you reach for ActivitySource (for traces) and Meter (for metrics) from the System.Diagnostics namespace. More on that in Part 2.
When you order something online, you can track it. You see: picked up at warehouse - arrived at regional hub - out for delivery - delivered. Each step has a timestamp and a duration. If the parcel goes missing, you know exactly which step failed.
Distributed tracing works the same way for requests in your software.
A trace is the complete record of a single request's journey - from the moment it hits your API, through every service call, database query, cache lookup, and background job it touches, until a response goes back to the caller. Each individual step is called a span.
Every trace carries a TraceId that ties the whole journey together across all services. Every span carries its own SpanId and a ParentSpanId that links it back to whichever span called it - allowing you to reconstruct the exact call tree. Spans can also carry attributes (key-value pairs like customer.id or http.status_code) and events (timestamped annotations like "cache miss").
In .NET, the native API for this is System.Diagnostics.Activity and System.Diagnostics.ActivitySource - designed to be OTel-compatible from the start. In OTel terms: Activity = Span, ActivitySource = Tracer.
Your car doesn't send you a notification every time a cylinder fires or the alternator charges the battery. You get a dashboard: RPM, speed, fuel level, engine temperature. Numbers that summarize the state of the system so you can glance at them while driving.
Metrics do the same for your application.
Metrics are numbers measured or aggregated over time. Unlike traces (which follow individual requests) or logs (which capture individual events), metrics are cheap to store, fast to query, and easy to alert on. You don't record every request - you record the count, the rate, the percentile distribution.
The OTel SDK defines four instrument types: a Counter (only increases - total orders placed), an UpDownCounter (increases or decreases - active connections), a Histogram (distribution of values - request duration, giving you p50/p95/p99), and an ObservableGauge (polled at collection time - current memory usage). In .NET, these map directly to System.Diagnostics.Metrics.Meter and its family of types, available since .NET 6.
A ship's captain keeps a logbook. Every notable event goes in, in order: "14:32 - departed port", "15:10 - storm began", "16:45 - storm passed". A chronological record of what happened and when. When something goes wrong, you read the logbook.
Your application's logs work the same way. The problem in distributed systems is context: you have log lines from five services, all interleaved by timestamp, with no way to tell which lines belong to the same user request.
OTel solves this cleanly. When you emit a log inside an active span, OTel automatically injects the TraceId and SpanId into the log entry. You can jump from a single log line to the full trace it belongs to.
And here is the best part for existing .NET codebases: if you already use Microsoft.Extensions.Logging (ILogger), you do not need to change any of your existing log calls. Add the OTel logging exporter once, and all your log calls automatically carry trace context from that point forward.
An exporter is the component that takes the telemetry data your application produces and ships it somewhere useful.
Think of it as a courier service. Your application boxes up the data - spans, metrics, log entries. The exporter is the courier that picks up those boxes and delivers them to wherever you want: Jaeger running on your laptop, a managed Datadog account, Azure Monitor, whatever fits your setup.
Because OTel separates the producing step from the shipping step, you can swap exporters without touching your instrumentation code. This is the whole point of vendor neutrality:
Switching from Jaeger to Datadog? You change the exporter configuration - one or two lines in Program.cs. The rest of your instrumentation code stays identical.
"OpenTelemetry replaces Jaeger / Datadog / Prometheus" No. OTel is the instrumentation and wire format layer. Jaeger, Datadog, and Prometheus are the storage, query, and visualization layer. They complement each other. OTel means you are not locked into any single backend.
"I will have to rewrite all my logging code"
No. OTel integrates with Microsoft.Extensions.Logging. Your existing ILogger calls get trace context automatically once you add the OTel logging exporter. You do not touch the call sites.
"Tracing is only useful for microservices"
Monoliths benefit too - sometimes more immediately. Slow database queries, external API latency, CPU-heavy code paths - traces expose all of this in a single-process application. If you have a database and an HttpClient, tracing is already useful.
"The performance overhead is not worth it" OTel sampling lets you record 1% of traces in production with near-zero overhead. You configure the sampling ratio. Use 100% in development. The SDK is designed to be a no-op when no exporter is attached.
"I need a paid service" Jaeger is free and open source. One Docker command and it is running. Prometheus is free. The OTel Collector is free. You only pay for a managed service if you decide you no longer want to run the infrastructure yourself - and that is a problem for later.
Part 2 picks up from here with the hands-on side: wiring up OpenTelemetry in an ASP.NET Core application from scratch, running Jaeger locally in one Docker command, and adding custom spans and metrics that reflect your own business domain. If the concepts clicked here, the code in Part 2 will feel obvious.
Have you started thinking about observability for your .NET services? Drop a comment or reach me on LinkedIn or X - I'd love to hear where you are in the journey.
For more .NET deep-dives on ASP.NET Core, distributed systems, and architecture, check out the rest of the posts on irina.codes.
dotnet opentelemetry observability csharp aspnet-core distributed-tracing