Tracing Distributed Systems with OpenTelemetry: A Practical Guide for Node.js Developers

js

Tracing Distributed Systems with OpenTelemetry: A Practical Guide for Node.js Developers

Learn how to trace requests across microservices using OpenTelemetry in Node.js for better debugging and performance insights.

Dec 21, 2025

Tracing Distributed Systems with OpenTelemetry: A Practical Guide for Node.js Developers

I’ve been thinking about distributed systems lately. Not the theory, but the reality. You build a service, then another, then ten more. Suddenly, a user reports an error. Which service failed? Where did the request slow down? The old logs are a sea of timestamps with no map. This is why I started looking at tracing. It’s not just a tool; it’s a way to see the story of a request as it travels through your entire system. Let’s build that map together.

Think of a request to load a user profile. It might hit an API gateway, then a user service, which calls a database and a separate payment service. Without tracing, each service logs in isolation. You see a database error in one log and a timeout in another, but connecting them is guesswork. How do you know they’re part of the same user’s failed request?

The answer is a trace. A trace is the full journey of a single operation. It’s made of spans, which are individual units of work, like a database query or an HTTP call. Spans are nested and linked, creating a timeline. This structure shows you the “what” and the “when” across service boundaries. It turns a pile of logs into a coherent narrative.

So, how do we start? We need a common language for our services to speak about traces. That’s where OpenTelemetry comes in. It’s a set of APIs, SDKs, and tools. It lets you generate, collect, and export telemetry data—traces, metrics, logs—in a standard way. The best part? You’re not locked into one vendor. You can send your data to Jaeger, Zipkin, or a cloud provider.

Let’s set it up for a Node.js service. First, install the packages.

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http

Now, create a tracing.js file. This initializes the OpenTelemetry SDK before your app starts. It’s crucial this runs first.

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const resource = new Resource({
  [SemanticResourceAttributes.SERVICE_NAME]: 'user-service',
});

const traceExporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
});

const sdk = new NodeSDK({
  resource,
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations()]
});

sdk.start();

In your main application file, import this tracing module at the very top.

// index.js - This must be the first line
require('./tracing');

const express = require('express');
const app = express();

app.get('/users/:id', async (req, res) => {
  // Your code here. HTTP calls will be auto-traced.
  res.json({ id: req.params.id });
});

app.listen(3000);

With just that, you have automatic instrumentation for incoming and outgoing HTTP calls. The SDK hooks into modules like http and express. It creates spans for each request, capturing details like the URL, method, and status code. But what about your own business logic?

Automatic instrumentation gives you the skeleton. For the muscle, you add custom spans. Let’s say your /users/:id endpoint calls a database and an external API. You want to see how long each takes.

const opentelemetry = require('@opentelemetry/api');
const tracer = opentelemetry.trace.getTracer('user-service-tracer');

app.get('/users/:id', async (req, res) => {
  const userId = req.params.id;
  let user;

  // Start a custom span for the main business operation
  const span = tracer.startSpan('fetch-user-data');
  
  try {
    // Add useful details to the span
    span.setAttribute('user.id', userId);
    
    // This simulated database call is now a child span
    user = await fetchUserFromDB(userId, span);
    
    // And so is this external call
    await fetchUserProfilePicture(userId, span);
    
    span.setStatus({ code: opentelemetry.SpanStatusCode.OK });
    res.json(user);
  } catch (error) {
    // Record the error on the span
    span.recordException(error);
    span.setStatus({ code: opentelemetry.SpanStatusCode.ERROR });
    res.status(500).send('Error');
  } finally {
    // Always end the span
    span.end();
  }
});

async function fetchUserFromDB(userId, parentSpan) {
  // Start a child span for this specific operation
  const span = tracer.startSpan('database-query', {
    attributes: { 'db.query': 'SELECT * FROM users' }
  }, opentelemetry.trace.setSpan(opentelemetry.context.active(), parentSpan));

  await new Promise(resolve => setTimeout(resolve, 50)); // Simulate DB delay
  span.end();
  return { id: userId, name: 'Test User' };
}

See how we nest spans? The fetch-user-data span is the parent. The database-query span is its child. This creates a hierarchy in your trace view. But here’s a key question: when this service calls another, how does the trace continue there? The magic is in context propagation.

When Service A calls Service B, it must send the trace context. This is usually done via HTTP headers. OpenTelemetry handles this for you with auto-instrumentation. The outgoing request from Service A gets specific headers injected. Service B’s auto-instrumentation reads those headers and continues the same trace. The span in Service B becomes a child of the span in Service A that made the call. This links the entire chain, even across different technologies.

You need somewhere to send all this trace data to visualize it. Let’s use Jaeger, a popular open-source tool. Run it locally with Docker.

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Update your exporter in tracing.js to point to Jaeger’s OTLP endpoint at http://localhost:4318. Now, run your service and make a request to http://localhost:3000/users/123. Open your browser to http://localhost:16686. You should see your service listed. Search for traces and click on one. You’ll see a timeline of spans—a visual story of that request.

What happens in production with thousands of requests per second? You can’t trace every single one. The volume of data would be huge and expensive. This is where sampling comes in. You sample a percentage of requests. OpenTelemetry allows different strategies. A common one is head-based sampling: you decide at the start of a trace whether to sample it. You can sample 100% of errors but only 5% of successful requests.

const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1) // Sample 10% of new traces
});

Pass this sampler to your NodeSDK configuration. This controls the data flow without changing your instrumentation code. It’s a balance between insight and cost.

As you add more services, you’ll start to see the real value. You can pinpoint why a page is slow. Is it the database? A specific microservice? An external API? You can see error rates per endpoint and track how a deployment changed performance. It moves you from reactive debugging to proactive observation.

Start simple. Instrument one service. See the traces. Then connect a second. The clarity it brings is often surprising. You begin to understand your system’s actual behavior, not just its intended design.

I hope this guide helps you bring that clarity to your own projects. If you found it useful, please share it with your team or leave a comment about your experience with tracing. What was the most surprising thing you found in your traces?

As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!

Our Creations

Be sure to check out our creations:

We are on Medium

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

js