Lambda · ECS/Fargate · CloudWatch · OpenTelemetry

AWS Serverless Log Analysis —
Möbius Finds the Lambda and Fargate Failures You’d Spend an Hour Looking For.

Serverless ships fast. Debugging does not. A single request crosses Lambda concurrency, Fargate tasks, API Gateway, and queues, and the evidence lands in a dozen log groups. Stop grepping CloudWatch. Dstl8’s AI agent reads the logs, finds the pattern, names the cause, and hands you the fix.

4–8 hrs

TYPICAL MANUAL CORRELATION TIME ACROSS LAMBDAS

15–30 min

SAME INCIDENT WITH AI-LED ANALYSIS

50

LOG GROUP CAP PER CLOUDWATCH INSIGHTS QUERY

1

UNIFIED INCIDENT VIEW INSTEAD OF N LOG GROUPS

2 min

FROM brew install gonzo TO FIRST USABLE STREAM

lambda failed in prod → which invocation? which stream? why?

cloudwatch insights 50 log group limit → query hits a wall

api gateway → lambda → sqs → lambda → which hop broke?

fargate task exit code 137 → oom? healthcheck? deployment?

x-ray 5% sampling → your failed request wasn’t one of them

cloudwatch insights cross account → single region, single account

adot lambda layer cold start → observability tax on every invoke

one request across 47 log streams → good luck

claude code deployed the fix → what did it actually do at runtime?

aws logs tail | gonzo → live signal in the terminal

Six Ways AWS Serverless Debugging Eats Your Afternoon.

CloudWatch has logs. The incident path does not. A real serverless failure spans log groups, concurrent instances, accounts, and services, and CloudWatch asks you to reconstruct all of it manually while the page is still open. These are the structural reasons time-to-answer stays long.

01

You’re the one doing the correlation

A user request touches half a dozen Lambdas and a Fargate task. Each Lambda has its own log group. Each group splits into a log stream per concurrent instance. CloudWatch gives you the raw entries. Stitching them into an incident timeline is manual work.

# CloudWatch shows
/aws/lambda/checkout-api 47 streams
/aws/lambda/webhook-processor 12 streams
/aws/ecs/payment-worker 6 streams

# you want
one timeline for request req-8f507cfc

# the gap: 4-8 hours of grepping

02

CloudWatch Insights runs out of room fast

Teams who commit to Insights run into ceilings mid-incident, and every query scans raw GB for a second bill on data you already paid to ingest.

concurrent queries: 30 max (Standard)
log groups per query: 50 max
records returned: 10,000 max
cross-account: unsupported
cross-region: unsupported

# the incident is bigger than the tool

03

X-Ray doesn’t see the request that failed

X-Ray samples roughly 1 request per second plus 5% of the rest. When your customer hits a single-request failure, the trace almost certainly doesn’t exist. You fall back to logs anyway.

X-Ray coverage: ~5%
single-request failure: 95% chance no trace
next step: back to log groups

04

The CloudWatch UI is not built for active debugging

Page refresh wipes search state. No severity filter in the native view. Log streams split when concurrent instances scale. Export is limited to what’s currently on screen. You lose context exactly when you need it most.

> 47-minute scroll back
> refresh
> state: gone
> stream ended, new stream started
> find the same request: start over

05

OpenTelemetry helps, but the default path taxes cold starts

ADOT layers solve vendor lock-in and come with an auto-instrumentation plus collector plus trace pipeline that adds memory and cold-start overhead on every function. Most teams want that pipeline for tracing, not for logs. If all you need is structured logs shipped somewhere useful, the tax is pure overhead.

+ ADOT layer memory bump
+ auto-instrumentation +100-300ms cold start
+ collector extension must flush before sandbox freeze
# all of that, for log output you already wanted to ship

06

CloudWatch Logs is a per-GB tax on every axis

Ingestion, storage, and query scan all bill per GB. Defaults are indefinite retention on the Standard tier, which is the most expensive path.

Lambda cost: $205 / yr
CloudWatch cost: $10,480 / yr
# same function
# retention: default (indefinite)
# one dev left full payload logging on

WHY THIS MATTERS

AWS serverless does not fail because it lacks logging. It fails in production when one request path crosses Lambda concurrency, Fargate tasks, multiple log groups, and sometimes multiple accounts. The debugging problem is time-to-answer. Raw log availability is not the bottleneck. Intelligent correlation is.

How AWS Serverless Teams Cut Time-to-Answer From Hours to Minutes.

The goal is not another place to read isolated log streams. The goal is an AI agent that reads the logs for you, correlates across services, names the cause, and tells you where to go next. Möbius does the first hour of triage before you sit down.

Möbius reads logs continuously and surfaces incidents before you open the dashboard

Dstl8’s AI agent watches your log streams for emerging failure patterns across Lambda, Fargate, and anything else you point at it. You find out because Möbius tells you, not because a customer did.

One view per service, regardless of concurrency

Dstl8 flattens log streams from every concurrent Lambda instance and every Fargate task into one logical service view. “checkout-api” is one place to look, not 47 separate streams that shift every time traffic scales.

Root cause narrative with named evidence

Dstl8 doesn’t just group related log lines. Möbius writes a short diagnosis: what happened, which service started it, which events confirm it, and the first thing to try. Every claim links back to the underlying log entries.

Query incidents from Claude Code, Cursor, or your terminal via MCP

The AI agent is reachable from the editor that wrote the code. Ask “what broke in checkout-flow in the last hour” from Claude Code and get an answer built on your live production logs, not a guess.

Bring your OpenTelemetry logs straight in

Add an OTel log exporter to your app code and point it at Dstl8. No Lambda layer, no auto-instrumentation agent, no collector extension running alongside every invocation, no cold-start surcharge. CloudWatch ingestion also works day one for the workloads you want to leave there.

What AWS Serverless Teams Get With ControlTheory.

Proactive root cause analysis, with receipts.

Möbius watches your log surface continuously, clusters related failures into incidents, names the likely cause, and shows the exact log lines that support the diagnosis. You walk into the investigation with the answer already drafted. Ask follow-ups from your editor or terminal without switching contexts.

Faster answers when “which Lambda failed” is the symptom, not the cause.

1 service view across every concurrent invocation
The question stops being “which of the 47 log streams had the error?” and becomes “what pattern is Möbius seeing, and in which service?” Dstl8 flattens concurrent Lambda instances into one service view and ranks what’s actually broken.

One incident view for Lambda errors, Fargate failures, and API Gateway 5xxs.

You don’t need another isolated log viewer. You need one list of active incidents, grouped by related signal, ranked by impact, with enough context to tell whether the failure started in function code, a Fargate container crash, or a downstream throttle. Möbius does that ranking continuously, so the first thing you see is the thing worth fixing.

Better Fargate task logs when container failures don’t happen in isolation.

Task-level stdout logs are useful. The real speedup happens when a Fargate container exit is automatically correlated with the upstream Lambda invocations, API Gateway 5xxs, and adjacent queue behavior that caused it. Ingest via the existing awslogs driver. No task redefinition, no sidecar.

OpenTelemetry logs from your app, no Lambda layer required.

Add an OTel log exporter to your application code and point it at Dstl8. That’s it. No ADOT Lambda layer, no collector extension running alongside every invocation, no cold-start surcharge from auto-instrumentation. Your function emits structured OTLP logs as part of normal execution. If you already run ADOT for traces, keep it — just route the log signal to Dstl8.

Start with terminal-native streaming before you commit to a bigger platform workflow.

2 min to first usable stream
Use Gonzo to tail CloudWatch log groups from your terminal, across Lambda, Fargate, RDS, or any other AWS service that lands logs in CloudWatch. Timestamps in your local zone, flexible filters, and AI-powered summarization on demand. Bring in Dstl8 when you want Möbius doing the proactive detection, cross-service correlation, and in-editor Q&A against your full log surface.

AWS Serverless Debugging: How Teams Actually Solve This.

Capability

Proactive incident detection

Unified view across Lambda instances and Fargate tasks

Cross-account, cross-region log surface

Pattern and anomaly detection across services

Root cause narrative with cited evidence

Query incidents from Claude Code / Cursor / terminal

CLOUDWATCH CONSOLE ONLY

watch manually
per log group, per stream
unsupported
manual

LOG DRAIN + GENERIC TOOL

rules required
export only
depends on setup
rules required

ControlTheory

Möbius, always on
one service view
first-class
continuous
Möbius diagnosis
MCP-native

AWS Serverless Log Analysis — Questions from Engineering Teams.

Start With Gonzo in Under 2 Minutes.

Open source terminal UI. No account, no agent, no configuration. Tail CloudWatch log groups and see what your Lambda is actually doing before you approve the next AI-generated patch.

Install Gonzo

Gonzo tails your log streams, surfaces patterns by severity, and sends individual entries to an LLM for explanation — all from your terminal. It’s the fastest way to bring real production signal into your AI SDLC workflow.

Homebrew

Go

Binary

Nix

Source

brew install gonzo

go install github.com/control-theory/gonzo/cmd/gonzo@latest

# Download the latest release for your platform from the releases page: # github.com/control-theory/gonzo/releases

nix run github:control-theory/gonzo

git clone https://github.com/control-theory/gonzo.git cd gonzo make build

Usage examples:

# Tail a CloudWatch log group:

aws logs tail “/aws/lambda/checkout-api” –follow –format json | gonzo

Stop Grepping CloudWatch. Let Möbius Find It.

Stream logs with Gonzo. Let Möbius detect, correlate, and diagnose with Dstl8. Trace Lambda errors, Fargate failures, and API Gateway 5xxs across services in one place, or ask your AI agent about them from Claude Code.

More for AI code generation reliability.

Gonzo + AWS CloudWatch: A Better Way to Tail Logs From Every AWS Service

A Tale of Two Log Types: Correlating Application and Infrastructure Events

AI-Generated Code Runtime Errors: Surface What’s Breaking in Your Stack

Cursor AI Run With Confidence: Debug What Cursor Ships, Fast

Gonzo Pro Tips & Roadmap: 30 Minutes With the Maintainers

You Can Vibe Code. You Can’t Vibe Debug.

Stop Grepping CloudWatch.
Let Möbius Find It.

Stream logs with Gonzo. Let Möbius detect, correlate, and diagnose with Dstl8. Trace Lambda errors, Fargate failures, and API Gateway 5xxs across services in one place — or ask your AI agent about them from Claude Code.