Three Ways to Trace End-to-end

On the left, we see a trace only with client and server side spans for a Lambda function trigger. On the right, we see more spans including ALB, and outgoing requests to S3 and to a Redis server.
  • Being able to accept and/or propagate the distributed tracing context
  • Participating into the incoming trace and producing spans

Transforming the trace context

One of the common difficulties in distributed tracing is the lack of a “standard” propagation header. The community has established options like B3, W3C TraceContext and various vendor-specific headers like X-Amzn-Trace-Id. Projects that are already instrumented to parse one of these formats, is completely unaware of the others.

  • It is simple to implement if downstream services consistently are supporting a different header.
  • If the converter is implemented as a proxy server, it doesn’t require any changes to the existing services.
  • When combined with a vendor-agnostic collection pipeline like the OpenTelemetry collector, services can keep publishing trace spans in vendor specific formats, and the collector can transform and send them to the same service. For example, if service A is using Jaeger and B is using Zipkin, OpenTelemetry collector can both accept spans coming from the services and transform all the data to be sent to Jaeger or Zipkin.
  • If trace headers are not convertible due to reasons like different identifier lengths or different fields, this option is not possible to implement.
  • If downstream services support a variety of different headers, this option is very complicated to implement and maintain.

Linking traces

In distributed tracing, linking is a concept to be able to associate two or more traces. Even though not all the data is captured under the same trace, when you link traces, you can still navigate from one to the other.

A trace collected for serviceA.Lookup makes a request to serviceB.Query. The client span links to another trace that contains the serviceB.Query’s server-side traces.
  • It doesn’t require any significant changes to the existing instrumentation and propagation formats.
  • Allows different data access levels. For example, sensitive traces can be split as a linked trace with different access levels.
  • Requires custom ways to propagate back the “linked trace” and record it.
  • Not many distributed tools support visualizing or querying links. Even when querying supported, querying is very limited in comparison to having all the spans under the same trace.
  • Hard to produce automatic service maps based on trace data, you may need to build custom solutions.
  • Propagating the downsampling decision is still a challenge unless all components are using the same trace header.

Partial traces

One of the reasonable options would be giving up on “end to end” and only focus on collecting partial traces from the services your group owns and maintains. Even though this is technically not a way to achieve “end-to-end”, it may help the organizations to understand the benefits of having distributed tracing. Even without end-to-end traces, it’s still useful see traces from a specific component to debug and identify issues. This provides a bottom-up approach where a team can communicate the value of distributed traces to the larger organization without having to convenience the entire organization to invest in it.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store