1

When we receive an HTTP call, we create a new trace id, and pass it to all microservices involved in dealing with that request and the subrequests that result from it.

This works fine until a certain flow involves calling an external service that provides the response asynchronously via a webhook call (so we call them, they reply with 202, and then they call our webhook endpoint when their response is ready): that webhook call is treated as a brand new HTTP call, it gets a new trace id, and so our trace is now broken in 2 different traces.

What's the best practice for restoring the previous interrupted trace?

Federico Fissore
  • 712
  • 4
  • 18

1 Answers1

0

You can solve this using context propagation. This is a technique to move data between services/processes. Consider the example below of the propagation of context to a worker process in OpenTelemetry:

// app.js
const { W3CTraceContextPropagator } = require("@opentelemetry/core");

app.get("/", (req, res) => {
  const parentSpan = tracer.startSpan('send_to_queue');

  // inject the context into the carrier object and send
  // into the queue
  const propagator = new W3CTraceContextPropagator();
  let carrier = {};

  propagator.inject(
      opentelemetry.trace.setSpanContext(opentelemetry.ROOT_CONTEXT,         
      parentSpan.spanContext()),
      carrier,
      opentelemetry.defaultTextMapSetter
  );

  queue.send({foo: 'bar', carrier });

  parentSpan.end();
});

// worker.js
channel.consume(queue, async function(payload) {
  // Extract the context from the queue payload and start
  // a child span. Ensures the new span is sent to the
  // original parent span.
  const data = JSON.parse(payload.content.toString());
  const parentContext =         opentelemetry.propagation.extract(opentelemetry.ROOT_CONTEXT, data.carrier);
  opentelemetry.context.with(parentContext, async () => {
    tracer.startActiveSpan(
      'process_queue_item',
      async (span) => {
        await makeExternalRequest("http://google.com");
        span.end();
    });
  });
});

In the same vein, you can propagate context to the external service and insert spans there which will be a part of the same trace. Some examples of doing this in other tracing tools are:

  1. In Datadog for NodeJS, the trace API can also inject context into a carrier for cross-process propagation, using the text-map setter:

     // app.js
     // inject context
     const opentracing = require("opentracing");
     const tracer = require('dd-trace').init();
    
     const span = tracer.scope().active();
     const carrier = {};
    
     tracer.inject(span, opentracing.FORMAT_TEXT_MAP, carrier);
    
     // worker.js
     // extract context and start child span
     const parent = tracer.extract(opentracing.FORMAT_TEXT_MAP, payload.carrier);
    
     const childSpan = tracer.startSpan("the.child", { childOf: parent });
    
     // do the work here
    
     childSpan.end();
    
  2. Honeycomb has specific documentation for inter-process propagation.

     // app.js
     const beeline = require("honeycomb-beeline")();
    
     const traceContext = beeline.honeycomb.marshalTraceContext(beeline.getTraceContext());
    
     payload.traceContext = traceContext;
    
     // worker.js
     const { traceId, parentSpanId, dataset, customContext } = beeline.honeycomb.unmarshalTraceContext(payload.traceContext);
    
     const trace = startTrace({ name }, traceId, parentSpanId, dataset, customContext);
    
     beeline.finishTrace(trace);
    
opeonikute
  • 494
  • 1
  • 4
  • 15