9 Best Distributed Tracing Tools for Developers

Decide Among The Best Tools For Distributing Tracing in your Backend Microservices Architecture

Published in

JavaScript in Plain English

9 min readFeb 8, 2023

In the past, before the rise of microservices, engineering organizations used monolithic architecture to build their business-critical applications. This often involved using basic infrastructure patterns, such as a single application server communicating with a single database. An example of a monolith application is illustrated below.

Monolithic architecture, while simple and easy for developers to understand, came with its own set of drawbacks. Applications were large and inflexible, leading to longer deployment times and lack of scalability. Upgrades required the entire application to be redeployed and code conflicts could occur when multiple developers worked on the same codebase. Additionally, it was difficult to scale specific parts of the application, as the entire application had to be scaled together.

To address these issues, developers began to decompose monolithic applications into smaller pieces called microservices. This approach allows for the application to be divided into separate services, each focusing on solving a specific problem. It also enables the application infrastructure to be distributed across different workloads, platforms, data centers, and cloud providers. This results in faster release times and the ability to scale different parts of the system independently. Additionally, developers can work on specific services at a given time, increasing efficiency and reducing conflicts.

Figure — Distributed application architecture

However, this distributed architecture resulted in difficulties and complexity in log monitoring. Developers could only see a small part of the overall picture as the logs for a single transaction became spread across several services. This lack of understanding of the software architecture led to delays in releases and had a negative impact on the organization.

Traditional monitoring tools designed for monolithic applications were unable to provide clear insight into the behavior and performance of distributed systems. Therefore, developers needed a new method for monitoring distributed services, and this is where tracing comes in.

What is Distributed Tracing?

Distributed Tracing is generally defined as following a single request through a multi-service or distributed architecture.

For example, if you were managing a serverless solution on AWS, you’d have a flow as shown below.

Figure — Distributed multi-service architecture on AWS

The illustration above shows a distributed multi-service architecture running on AWS that utilizes various services like API Gateway, Lambda, DynamoDB, DynamoDB Streams, and SNS Topic Subscriptions. It is important to log in at different locations to understand what’s happening within the system.

Distributed tracing can be used to understand the performance of a specific service within the larger distributed application. For example, in the illustration above, distributed tracing can be used to track how the Lambda function processes user data received from the API Gateway.

To fully understand tracing, it is crucial to know how a trace is created. A trace can be broken down into “Spans,” which represent a single operation within a trace (such as an HTTP call or a DB query). These spans are typically associated with individual URIs or services that participate in the larger request context, such as authentication.

Figure — Tracing within a distributed microservice architecture

As shown in the above diagram, a trace context is passed across each service (process/span) in your distributed architecture to track a user request across multiple services. Thus, you can see how a user request performs across several spans without maintaining multi-page dashboards.

Best Tools For Distributed Tracing in Microservices

There are dozens of distributed tracing tools out there, so it’s important to understand the difference between them to select the one that is best fit for your needs.

Here are the top 9 distributed tracing tools for microservices.

1. Helios

Helios is a developer platform that provides meaningful insights into your end-to-end application flows by adapting OpenTelemetry’s context propagation framework to connect the dots.

Key features of Helios

Provides a single source of truth for how data flows through your entire application in any environment.
Offers E2E visibility into your system across microservices, serverless functions, databases, and 3rd party APIs, enabling you to quickly identify, reproduce and resolve issues.
Sees distributed tracing information in full context.
Offers integration with your existing ecosystem — logs, tests, error monitoring, etc.
It enables you to reproduce the exact flow, including HTTP requests, Kafka, RabbitMQ messages, and Lambda invocations, in a couple of clicks.

Benefits of Helios

You can visualize complex sync and async flows (HTTP requests, gRPC calls, serverless invocations, messaging queues, event streams, and more).
You can pinpoint bottlenecks and broken flows in your application in minutes.
You can filter errors by service, API calls, message queues, and streams with extensive search capabilities.
Installation only takes a few minutes.
You can generate test code from any flow in a few clicks and validate any operation, from database operations to 3rd party API calls.

You can get started with Helios free tier to try its features in your distributed production workloads. In addition, you can experiment with the tool in its sandbox too.

Figure: Helios Sandbox for Tracing Tryout — Figure: Helios Sandbox

Additionally, if you wish to understand the power of visualization of a trace using Jaeger, Helios offers a free trace visualization tool based on OpenTelemetry to do so.

Figure: Trace Visualization Tool offered by Helios

2. Lightstep

Lightstep is a cloud-agnostic tool that offers full-context distributed tracing across a distributed microservices architecture or a complex multi-cloud environment. Lightstep is highly encouraged in complex systems.

Lightstep offers a free plan suitable for any development team looking to get started with the tool. It offers data ingestion, analysis, monitoring, and more.

Features of Lightstep

Offers complete system visibility: It provides end-to-end visibility across an application allowing developers to see and monitor the entire picture within a distributed environment.
Offers instant insights: Lightstep provides quick and instant responses on traces to help developers understand the reason(s) behind performance issues.
Granular visibility: Lightstep lets developers pinpoint exact issues across your stack via granular visibility.

Benefits of Lightstep

It has no vendor lock-in, meaning you can deploy your application in another cloud without worrying about Lightstep failing.
It also offers observability solutions, ensuring that developers can monitor the system’s internal state.

3. Zipkin

Zipkin is an open-source distributed tracing system initially developed at Twitter to gather timing data needed to troubleshoot latency problems in service architectures. It is straightforward to set up with the Docker command shown below.

docker run -d -p 9411:9411 openzipkin/zipkin

Initially, developers are required to instrument the tracing tool onto their services. Then, for each request, Zipkin will assign a unique trace ID to help the tool identify the request for a collection of services. This is required for data collection and lookup.

Features of Zipkin

Integration with Elasticsearch for efficient log searching.
It queries records based on extensive conditions such as — duration and service names.
It computes data such as the percentage of time a request spent in service and the pass rate of the operation.
It is open-source. Developers can browse through its implementation and even fix its bugs (if any).
It is easy to set up. All you need is a Docker command followed by minimal code instrumentation.
Its built-in UI offers limited data visualization. Hence, you may need to integrate tools such as Grafana for better visualization.

If open-source is a requirement for you, consider using Zipkin when implementing distributed tracing.

4. Jaeger Tracing

Jaeger Tracing is an open-source end-to-end distributed tracing tool. It helps developers monitor and troubleshoot complex microservices-based distributed systems transactions.

Features of Jaeger Tracing

Distributed tracing monitoring.
Root cause analysis helps identify the key performance/latency bottlenecks across a trace.
It supports Elasticsearch for data persistence which can be combined with the pre-built Jaeger UI to help you filter the traces based on service, duration, and tags.
It exposes Prometheus metrics by default to help derive meaningful insights.
However, it does not flag anomaly traces, meaning developers must analyze traces strongly across microservices.

You can get started with Jaeger Tracing for free and experience its features here.

5. SigNoz

SigNoz is an open-source distributed tracing and application performance monitoring tool. It captures logs, traces, and metrics all in one place.

Features of SigNoz

It uses a unified UI to present logs, metrics, and traces.
It offers insightful performance metrics such as the p50, p95, and p99 latency.
It is free to use.
It uses one UI to showcase logs, metrics, and traces.
It has native support for OpenTelemetry.

Get started with SigNoz for free.

6. New Relic

New Relic is an APM tool. It uses the “New Relic Edge” service for distributed tracing and can observe 100% of an application’s traces.

Features of New Relic Edge

It offers distributed tracing and sampling options for a vast technology stack.
It provides support for the industry standard observability framework — OpenTelemetry.
It supports alerts & dashboards to diagnose errors before customers notice them.
It offers an easy setup where developers can install one agent to instrument the entire application code automatically.

If the benefits of using New Relic interest you, get started for free.

7. DataDog

DataDog is another APM vendor offering cloud monitoring and observability.

Features of DataDog

Performance dashboards for web services, queues, and databases for error, requests, and latency monitoring.
Correlation of distributed tracing to browser sessions, logs, profiles, network, processes, and infrastructure metrics.
It can ingest 50 traces per second per APM host.
It can support seamless instrumentation to monitor cloud infrastructure.

If the benefits of using DataDog interest you, get started for free.

8. Splunk

Splunk offers a distributed tracing tool that can ingest all application data for an in-depth analysis.

Features of Splunk

It uses an AI-driven service to identify error-prone microservices.
It offers a correlation between application and infrastructure metrics.

You can get started with Splunk for free.

Figure: Splunk UI for Tracing — Figure: Splunk UI

9. Honeycomb

Honeycomb is an observability tool that supports distributed tracing. Its most prominent feature is that it uses anomaly detection to tell which spans are tied to bad user experiences.

Benefits of Honeycomb

Offers instrumentation with no vendor lock-in using OpenTelemetry.
Offers anomaly detection on spans to pinpoint bad user experiences.
Offers a pay-as-you-go pricing model to only pay for what you use.

You can browse the Honeycomb Sandbox to get a hands-on experience with the tool before integrating it.

Figure: Honeycomb UI for Tracing — Figure — HoneyComb UI

Concluding Thoughts

Distributed tracing has become essential for development teams working in distributed microservice architectures to ensure that problems are identified and fixed right when they occur. I only covered 9 tools here, but there are many more. Make sure you select the one that best fits your needs today, and in the future, and that serves developers and DevOps users.

Thank you for reading.