Observability Next: Tackling The Why

May 30, 2025

By Bob Quillin

To seriously consider what is Observability Next, we need to start with the most fundamental question we should be asking our teams: Why are we collecting all this data? This singular question opens the floodgates to more why’s. Why do we need to store all these logs and metrics?

To seriously consider what is Observability Next, we need to start with the most fundamental question we should be asking our teams:

Why are we collecting all this data?

This singular question opens the floodgates to more why’s. Why do we need to store all these logs and metrics? Even with all these logs, metrics, and traces, why can’t we find the root cause of this issue? Why did it take us three days to track down that problem?

If you start with the why’s, observability objectives become crystal clear. Observability today relies too heavily on the philosophy of “collect and store everything, ask why later.”

Why is Observability Costing Us So Much?

Rising observability costs are painful, but they’re a symptom of a more foundational issue. We pay to ingest this data, index it, store it, and when necessary, rehydrate it. But back to the why:

Why are we observing our infrastructure and application performance in the first place?

The end game of observability is not to observe but to:

Prevent and solve problems
Reduce MTTR
Accelerate root cause analysis
Unearth critical new business KPIs

So why is observability costing so much? If we worked backwards from the problems we’re trying to solve, we could collect just what we need, when we need it, and for whom. That would lead us back to the intent behind what we’re collecting.

In the absence of that clarity, we fall back on collecting and storing everything. The observability ecosystem today meets the market exactly where it is—delivering products that do just that.

Fat Telemetry Pipes and Massive Data Lakes

So how did we get here? Proprietary agents make it trivially simple to collect everything. Fat telemetry pipes stream data up to ever-growing, amazingly efficient data stores—meeting the market need for more and cheaper telemetry storage.

This model of fat “dumb” pipes and massive data lakes unfortunately reinforces the worst lazy engineering habits we’ve developed over the last decade of observability.

If you don’t know what to collect or how to analyze it, you store everything you can and slap the data into an ever-increasing number of… dashboards.

The Goal of Observability Isn’t the Dashboard—Is It?

Why do we have so many dashboards?

That’s one of my favorite “why” questions. Did you ever stop to think that there has to be more to observability than graphs, charts, and dashboards? It makes sense—since so much telemetry is flooding in, the only way to make sense of it seems to be yet another dashboard.

For evidence, just count the number of dashboard tabs you keep open in your browser at any point in time.

Observability Next—to me—is imagining observability without dashboards. What if:

Observability solutions presented answers instead of dashboards
The only telemetry you sent was the telemetry needed for those answers
Analytics could provide feedback to collectors and pipelines to request more data when needed
And dial it down when done

Disrupting the Observability Supply Chain

The building blocks of Observability Next are already forming across the observability supply chain. Starting with collection, then moving to control, and finally to analysis—each layer requires the others:

Collection and instrumentation: OpenTelemetry
Distribution and control: Adaptive Control Planes
Intelligent problem solving: Inference Engines

OpenTelemetry is already reshaping observability by breaking vendor lock-in at the data collection layer. Instead of proprietary instrumentation that ties you to a single platform, OpenTelemetry offers open, flexible, and vendor-neutral telemetry—putting data ownership and control back in the hands of engineering teams.

The Need for Control

Building scalable cloud-native systems requires a clear separation between the data plane and the control plane. Only then can a full management plane emerge to drive the next generation of intelligent problem solving.

OpenTelemetry has laid the foundation for open data collection, but it needs a robust control plane to give organizations real feedback loops and control over their telemetry. By evolving from one-way data pipelines to two-way feedback systems, teams can actively manage telemetry to achieve desired outcomes—just like in traditional control systems where controllability and observability work hand in hand.

Why This Matters for Engineering Leaders

As engineering leaders, you’re constantly balancing competing priorities:

Accelerating feature delivery while maintaining system reliability
Managing operational costs while improving visibility and control
Reducing downtime and human churn while ensuring critical issues are addressed

Observability Next must directly address these challenges by going back to the Why:

Reducing mean time to resolution (MTTR) through better APM
Optimizing telemetry to focus on high-value signals that actually solve problems
Discovering how to master new AI tech with a human in the loop

For ongoing conversation and engagement, check out the “Observability Next” newsletter on LinkedIn. It’s designed for engineering leaders navigating the rapidly evolving observability landscape.

Each edition explores trends, strategic shifts, and practical ideas to help teams move beyond the status quo—toward faster resolution, clearer insight, and smarter operations. I share industry patterns, personal experiences, and approaches that challenge conventional thinking.

Or check out these related posts:

Back

For media inquiries, please contact
press@controltheory.com

Find the Right Answers: Making Logs Conversational with Gonzo – Part 2

The gap isn’t in the data; it’s in the interpretation. We’ve mastered collection, so what we need next is comprehension. The goal is to converse with your application and infrastructure telemetry in real time.

November 5, 2025

By Jon Reeve

Surface the Right Insights: Why Logs Still Matter (and Why Now) – Part 1

You open CloudWatch or Datadog. Metrics show rising latency, but they don’t show what’s causing it. You end up in the same place every engineer goes: the logs. That’s where truth lives, but also where time goes to die. Logs remain the most complete signal we have. The problem is that we collect too much of them, too late, at too high a cost for effective log analysis.

October 28, 2025

By Jon Reeve

See ControlTheory at KubeCon + CloudNativeCon North America 2025 – Goin’ Gonzo on Log Distillation

KubeCon is back, and we’re excited to be a sponsor and headed to Atlanta, November 10–13, to showcase innovations in distillation and what’s next in practical AI for developers and SREs at Booth 1570. Cloud-native teams need better tools that keep you in your flow with continuous analysis and explanations, in-context troubleshooting, and efficient distillation. […]

October 27, 2025