Surface the Right Insights: Why Logs Still Matter (and Why Now) – Part 1

October 28, 2025 — last updated: November 7, 2025 By Jon Reeve

You open CloudWatch or Datadog. Metrics show rising latency, but they don’t show what’s causing it. You end up in the same place every engineer goes: the logs. That’s where truth lives, but also where time goes to die. Logs remain the most complete signal we have. The problem is that we collect too much of them, too late, at too high a cost for effective log analysis.

Distilling Observability: From Noise to Knowing 

ControlTheory's "Distilling Observability" series explores how observability is evolving from raw data capture to continuous, intelligent understanding. From surfacing the right insights, to finding the right answers, to cutting the right costs — this series reframes how teams turn noise into knowledge.

If you’re a DevOps engineer or SRE managing workloads on Kubernetes, you already know how fast your telemetry data grows. A small change to a Kafka consumer or PostgreSQL parameter can ripple through an entire environment. Queues back up, user transactions slow down, and your incident channel starts buzzing.

Logs remain the most complete signal we have. The problem is that we collect too much of them, too late, at too high a cost for effective log analysis.

The Modern Observability Pain for SREs and DevOps Engineers

Observability challenges come in many forms. Reasons range across four key areas:

Coverage – what percentage of your applications are instrumented for observability, and what is stopping you from being at 100% in dev, test, and production?
Completeness – is the data you are collecting giving the granularity, detail, and insights needed to improve reliability and customer application experience?
Complexity – are you spending more time building your observability/OTel tooling than on revenue-generating activities?
Cost – is the increasing cost of operations impacting whether and how you build observability into applications?

All of these factors will come into play regardless of the environmental scale and skill of your engineers.

Challenge	Common Scenario	Impact
Log Volume Explosion	Microservices, Kafka, and functions each generate their own structured logs. A single release can create gigabytes of repetitive data.	High storage and query costs.
Delayed Insight	Logs are shipped to CloudWatch or Datadog, parsed, indexed, and queried post-hoc.	Context arrives after incidents occur.
Tool Friction	Dashboards summarize symptoms, not causes. Engineers still pivot to CLI or raw logs.	Increased MTTR and cognitive fatigue.
Developer Detachment	Front-end and app teams rarely explore observability tools directly.	Slower feedback during continuous deployment.

In most mid-size environments, observability now costs more than production compute. According to the 2025 Grafana Labs State of Observability report, 87% of organizations rely primarily on logs for root cause visibility. Yet the more we collect, the harder it becomes to find meaning.

Why Traditional Log Pipelines Struggle in EKS Environments

Let’s take two examples from our community:

Fintech use-case: A regional bank runs a credit-risk platform on Amazon EKS with PostgreSQL (RDS), Kafka, and Python-based scoring services. Each deployment triggers high log churn — Kafka consumer lag warnings, RDS connection resets, and pod restart events.
SaaS use-case: A B2B analytics product uses React front-ends, Node-based APIs, and Lambda functions for batch jobs. Error bursts during releases flood CloudWatch with thousands of identical traces.

Both teams need full observability for compliance and performance. But they don’t need to ship every log line to CloudWatch or Datadog for effective log analysis. What they need is a way to see structure at the edge, decide what’s useful, and get fast and clear insights from it.

Traditional vs Distilled Telemetry Flow

Enter Gonzo: Real-Time Log Distillation in your Console

Gonzo is an open-source terminal UI (TUI) built for SREs and DevOps engineers who live in the CLI. It connects directly to OpenTelemetry (OTel) log streams and a wide variety of other formats, automatically grouping, filtering, and highlighting patterns in real time, before logs ever hit your backend.

You can run it locally in your terminal, pointing directly at your Kubernetes cluster logs, or connect it as an OTLP receiver to stream data from your existing OTel collectors or agents.

How it works:

Point Gonzo at your log source, this could be kubectl logs, integrations like Stern or K9s, or an OTLP stream from your OpenTelemetry Collector, or any instrumented applications.
Gonzo listens locally or over the network, auto-detects format, and distills incoming telemetry into real-time patterns grouped by severity, service, and version.
Use Gonzo for instant insights to identify key patterns and surface issues without having to leave the console.
Continue with your preferred traditional logging flow in parallel for retention and additional forensics.

For example, a simple OpenTelemetry Collector config to mirror production logs to Gonzo might look like this:

receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  otlp/gonzo:
    endpoint: "localhost:4317"
    tls:
      insecure: true
service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [otlp/gonzo]

receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  otlp/gonzo:
    endpoint: "localhost:4317"
    tls:
      insecure: true
service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [otlp/gonzo]

Now you can launch Gonzo in OTLP mode:

gonzo --otlp-enabled --listen :4317

gonzo --otlp-enabled --listen :4317

You’ll start seeing log streams from your applications instantly, grouped by service, severity, and attribute. From there, you get insights like surface truths that are hidden in the noise of vast amounts of logs.

In this case, we see:

Repeated TimeoutError messages in trade-processor pods
Kafka consumer lag spikes corresponding with RDS connection refused
5xx bursts from frontend-api pods during blue/green rollout

Inside Gonzo these patterns are discovered and surfaced for quick troubleshooting without having to pore over logs manually.

From the Log Counts Analysis view, we see patterns that Gonzo has already identified, including groupings by severity.

Gonzo logs analysis, grouped by severity

In the main view, you can tab or click into the logs

All you have to do is press i to initiate analysis of the log detail using a local or hosted AI model of your choice. As you can see in this example below, Gonzo observed the pattern and was able to correlate to a specific source.

Gonzo using Local or Hosted AI model for logs analysis

With Gonzo, you keep your Datadog, CloudWatch, or Loki pipelines intact. But now you get distilled, high-value telemetry instead of raw volume. Increased clarity, and an opportunity to reduce your observability costs.

The goal is to be able to get instant insights that can drastically cut down MTTR. All without having to open logs manually and you don’t have to leave the terminal!

How Gonzo Finds Problems Faster With Existing Toolchains

Gonzo doesn’t replace your existing tools; it helps you find problems faster by clustering log patterns, highlighting anomalies, and revealing relationships that are often buried in dashboards and complex queries. Gonzo runs alongside your existing pipelines, giving you instant visibility into what’s really happening, directly in your terminal, before you dig into Datadog, CloudWatch, or Loki.

Gonzo analyzes logs in real-time, surfacing root-cause patterns from data that’s already ingested or streaming in. It helps you see structure and meaning faster, without altering or filtering your existing pipelines.

Platform	What It Provides	How Gonzo Enhances It
CloudWatch	Native AWS log collection and alerting	Analyze live log streams to detect repetitive or high-frequency patterns without filtering or dropping data
Datadog	Dashboards, metrics, APM	Dual-ship or stream logs to Gonzo for analysis before or after ingestion, depending on your pipeline setup
Grafana Loki	Query and visualization	Adds contextual layers to logs live tailed from Grafana Loki, highlighting service, region, and deployment-level attributes for faster correlation
OpenTelemetry	Unified data model, rich attributes/metadata	Bubbles up patterns in logs so you can identify trends and signals; attributes/metadata stats & filtering.
Other Logs	Query and visualization	Analyze static files or stream any log for real-time analysis using the interactive TUI

Faster Clarity in a Financial Services Stack

In one scenario, an SRE and a developer collaborate using Gonzo in the same EKS cluster that runs their credit risk workloads. The SRE spots anomalies across Kafka and RDS; the developer traces the faulty service version in seconds, all from the terminal.

During a release, queue delays appeared in Kafka, and transaction latency increased. Gonzo surfaced the issue immediately: a single version of the scoring service was retrying RDS writes in a loop.

Instead of scrolling through 400,000 redundant log lines, the engineer saw three clustered error patterns in under 30 seconds. The engineer retained those clustered patterns in Datadog for audit tracking and used the insights to tune their CloudWatch retention, keeping only the logs that mattered for compliance. Gonzo doesn’t filter logs directly; it provides the signal that helps teams optimize what they send downstream.

The outcome: faster triage, lower data egress, and zero loss of visibility.

Streaming Telemetry From Microservices to Gonzo

Why Logs Still Matter (and Always Will)

Metrics and traces show symptoms. Logs explain the behavior that caused them. They are still the most granular, human-readable view of a system’s truth.

The goal isn’t to collect less; it’s to collect intelligently by distilling insights from volumes of noisy logs. Gonzo gives SREs, DevOps, and Ops engineers the controllability they’ve been missing: distillation before ingestion, insight before query, and clarity before cost.

Gonzo Quick Start Guide

Install with Homebrew:

brew install gonzo

brew install gonzo

Then run against your EKS environment:

kubectl logs -f deployment/my-app | gonzo

kubectl logs -f deployment/my-app | gonzo

Watch as repetitive noise compresses into actionable clusters, live in your terminal, before it becomes another terabyte of usage in your observability bill.

Control your observability. Don’t let it control you.

Keep an eye out for Part 2 of ControlTheory’s “Distilling Observability” series to learn more about how how observability is evolving from raw data capture to continuous, intelligent understanding.

Surface Unknown Unknowns Automatically

Catch emergent patterns from AI-generated code in staging—before they become production incidents.

Learn About Dstl8

Back

For media inquiries, please contact
press@controltheory.com

Ready to Deploy Dstl8?

Join engineering teams catching emergent patterns in staging before they page you at 2am.

Book a Demo

Catch ControlTheory at Observability Summit: Stop Standing Up a Collector Just to Read Your Own Logs

📅 Thursday, May 21 · 12:05–12:15pm CDT 📍 Level One, Ballroom B

May 19, 2026

By Bob Quillin

Introducing Dstl8.Lite: a Local GUI for Gonzo, One Keystroke Away

Now you can get some GUI with your TUI! And it’s all local. For most of us, looking at our own OpenTelemetry logs still means standing up a collector, wiring a debug exporter, and squinting at JSON. We built Gonzo to skip that loop. Point your app’s OTLP exporter at Gonzo, run your code, and your logs […]

By Jon Reeve

Image showing Gonzo combining and analyzing logs from Vercel and Supabase

You Can Vibe Code. You Can’t Vibe Debug.

Vibe coding debugging is the bottleneck nobody’s talking about. Here’s how to fix it on the Vercel + Supabase stack – from your terminal. If you’re shipping AI-native apps right now, your stack probably looks like this: Claude Code or Cursor writing the code, Next.js on Vercel running it, Supabase handling data and auth. It […]

April 14, 2026