Why is metric collection still a hard problem in 2020?

High cardinality labels

In the last decade, high cardinality labels became a hot topic in metric collection. High cardinality labels allow you to break down metrics, and allow you to filter and/or aggregate with labels. Without this capability, you have to know a lot more about your system and how it behaves to monitor it. This capability doesn’t only allow you to break down your data in various different dimensions, it allows you to create new dimensions based on existing ones dynamically.

Cross-stack labels

Even though metric labelling is seeing more adoption, being able to propagate labels on wire is still an unsolved puzzle. There are no well established propagation standards on wire or in language runtimes.

Correlation and exemplars

In a typical telemetry data collection pipeline, there is often more than just metrics. We instrument our services with a variety of different tools. One example everyone can relate to is logs. Others could be events, distributed traces, runtime profiles coming from production and any other telemetry data you can name.

Export formats

Collecting metrics has always been a difficult topic because there are way many ways how services and platforms export metric data. Each year, there are a few new initiatives that is trying to solve this outstanding problem.

Pull vs push

In a pull model, your metric collection system is pulling metrics from your services whereas in a push-model your services are pushing the metrics to a metric collection service. This topic is a routine debate topic even though there is no single answer to the problem whether pulling or pushing metrics is a better approach.

Aggregation

Metric collection pipelines often aggregates data because reporting each individual metric collection wouldn’t scale in large systems. Aggregation often happens in multiple layers. Metrics collected from the services can be continuously aggregated along the way until it is in long-term storage. Building and operating aggregation pipelines are not always trivial. Additional to pipelines, there are instrumentation libraries that start aggregating inside the application processes. Figuring out the right aggregation window and fine-tuning aggregations for performance can be a hard problem.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store