AWS Serverless Log Analysis: Debug Lambda & Fargate Fast

Lambda · ECS/Fargate · CloudWatch · OpenTelemetry

AWS Serverless Log Analysis —
Möbius Finds the Lambda and Fargate Failures You’d Spend an Hour Looking For.

Serverless ships fast. Debugging does not. A single request crosses Lambda concurrency, Fargate tasks, API Gateway, and queues, and the evidence lands in a dozen log groups. Stop grepping CloudWatch. Dstl8’s AI agent reads the logs, finds the pattern, names the cause, and hands you the fix.

Start with Gonzo

4–8 hrs

TYPICAL MANUAL CORRELATION TIME ACROSS LAMBDAS

15–30 min

SAME INCIDENT WITH AI-LED ANALYSIS

50

LOG GROUP CAP PER CLOUDWATCH INSIGHTS QUERY

1

UNIFIED INCIDENT VIEW INSTEAD OF N LOG GROUPS

2 min

FROM brew install gonzo TO FIRST USABLE STREAM

lambda failed in prod → which invocation? which stream? why?

cloudwatch insights 50 log group limit → query hits a wall

api gateway → lambda → sqs → lambda → which hop broke?

fargate task exit code 137 → oom? healthcheck? deployment?

x-ray 5% sampling → your failed request wasn’t one of them

cloudwatch insights cross account → single region, single account

adot lambda layer cold start → observability tax on every invoke

one request across 47 log streams → good luck

claude code deployed the fix → what did it actually do at runtime?

aws logs tail | gonzo → live signal in the terminal

Six failure modes

Six Ways AWS Serverless Debugging Eats Your Afternoon.

CloudWatch has logs. The incident path does not. A real serverless failure spans log groups, concurrent instances, accounts, and services, and CloudWatch asks you to reconstruct all of it manually while the page is still open. These are the structural reasons time-to-answer stays long.

01 You’re the one doing the correlation

A user request touches half a dozen Lambdas and a Fargate task. Each Lambda has its own log group. Each group splits into a log stream per concurrent instance. CloudWatch gives you the raw entries. Stitching them into an incident timeline is manual work.

# CloudWatch shows
/aws/lambda/checkout-api 47 streams
/aws/lambda/webhook-processor 12 streams
/aws/ecs/payment-worker 6 streams

# you want
one timeline for request req-8f507cfc

# the gap: 4-8 hours of grepping

02 CloudWatch Insights runs out of room fast

Teams who commit to Insights run into ceilings mid-incident, and every query scans raw GB for a second bill on data you already paid to ingest.

concurrent queries: 30 max (Standard)
log groups per query: 50 max
records returned: 10,000 max
cross-account: unsupported
cross-region: unsupported

# the incident is bigger than the tool

03 X-Ray doesn’t see the request that failed

X-Ray samples roughly 1 request per second plus 5% of the rest. When your customer hits a single-request failure, the trace almost certainly doesn’t exist. You fall back to logs anyway.

X-Ray coverage: ~5%
single-request failure: 95% chance no trace
next step: back to log groups

04 The CloudWatch UI is not built for active debugging

Page refresh wipes search state. No severity filter in the native view. Log streams split when concurrent instances scale. Export is limited to what’s currently on screen. You lose context exactly when you need it most.

> 47-minute scroll back
> refresh
> state: gone
> stream ended, new stream started
> find the same request: start over

05 OpenTelemetry helps, but the default path taxes cold starts

ADOT layers solve vendor lock-in and come with an auto-instrumentation plus collector plus trace pipeline that adds memory and cold-start overhead on every function. Most teams want that pipeline for tracing, not for logs. If all you need is structured logs shipped somewhere useful, the tax is pure overhead.

+ ADOT layer memory bump
+ auto-instrumentation +100-300ms cold start
+ collector extension must flush before sandbox freeze
# all of that, for log output you already wanted to ship

06 CloudWatch Logs is a per-GB tax on every axis

Ingestion, storage, and query scan all bill per GB. Defaults are indefinite retention on the Standard tier, which is the most expensive path.

Lambda cost: $205 / yr
CloudWatch cost: $10,480 / yr
# same function
# retention: default (indefinite)
# one dev left full payload logging on

WHY THIS MATTERS

AWS serverless does not fail because it lacks logging. It fails in production when one request path crosses Lambda concurrency, Fargate tasks, multiple log groups, and sometimes multiple accounts. The debugging problem is time-to-answer. Raw log availability is not the bottleneck. Intelligent correlation is.

The solution

How AWS Serverless Teams Cut Time-to-Answer From Hours to Minutes.

The goal is not another place to read isolated log streams. The goal is an AI agent that reads the logs for you, correlates across services, names the cause, and tells you where to go next. Möbius does the first hour of triage before you sit down.

Möbius reads logs continuously and surfaces incidents before you open the dashboard

Dstl8’s AI agent watches your log streams for emerging failure patterns across Lambda, Fargate, and anything else you point at it. You find out because Möbius tells you, not because a customer did.

One view per service, regardless of concurrency

Dstl8 flattens log streams from every concurrent Lambda instance and every Fargate task into one logical service view. “checkout-api” is one place to look, not 47 separate streams that shift every time traffic scales.

Root cause narrative with named evidence

Dstl8 doesn’t just group related log lines. Möbius writes a short diagnosis: what happened, which service started it, which events confirm it, and the first thing to try. Every claim links back to the underlying log entries.

Query incidents from Claude Code, Cursor, or your terminal via MCP

The AI agent is reachable from the editor that wrote the code. Ask “what broke in checkout-flow in the last hour” from Claude Code and get an answer built on your live production logs, not a guess.

Bring your OpenTelemetry logs straight in

Add an OTel log exporter to your app code and point it at Dstl8. No Lambda layer, no auto-instrumentation agent, no collector extension running alongside every invocation, no cold-start surcharge. CloudWatch ingestion also works day one for the workloads you want to leave there.

What you get

What AWS Serverless Teams Get With ControlTheory.

MÖBIUS

Proactive root cause analysis, with receipts.

Möbius watches your log surface continuously, clusters related failures into incidents, names the likely cause, and shows the exact log lines that support the diagnosis. You walk into the investigation with the answer already drafted. Ask follow-ups from your editor or terminal without switching contexts.

LAMBDA

Faster answers when “which Lambda failed” is the symptom, not the cause.

1 service view across every concurrent invocation
The question stops being “which of the 47 log streams had the error?” and becomes “what pattern is Möbius seeing, and in which service?” Dstl8 flattens concurrent Lambda instances into one service view and ranks what’s actually broken.

CROSS-SERVICE INCIDENTS

One incident view for Lambda errors, Fargate failures, and API Gateway 5xxs.

You don’t need another isolated log viewer. You need one list of active incidents, grouped by related signal, ranked by impact, with enough context to tell whether the failure started in function code, a Fargate container crash, or a downstream throttle. Möbius does that ranking continuously, so the first thing you see is the thing worth fixing.

ECS / FARGATE

Better Fargate task logs when container failures don’t happen in isolation.

Task-level stdout logs are useful. The real speedup happens when a Fargate container exit is automatically correlated with the upstream Lambda invocations, API Gateway 5xxs, and adjacent queue behavior that caused it. Ingest via the existing awslogs driver. No task redefinition, no sidecar.

OTEL

OpenTelemetry logs from your app, no Lambda layer required.

Add an OTel log exporter to your application code and point it at Dstl8. That’s it. No ADOT Lambda layer, no collector extension running alongside every invocation, no cold-start surcharge from auto-instrumentation. Your function emits structured OTLP logs as part of normal execution. If you already run ADOT for traces, keep it — just route the log signal to Dstl8.

Get Started

Start with terminal-native streaming before you commit to a bigger platform workflow.

2 min to first usable stream
Use Gonzo to tail CloudWatch log groups from your terminal, across Lambda, Fargate, RDS, or any other AWS service that lands logs in CloudWatch. Timestamps in your local zone, flexible filters, and AI-powered summarization on demand. Bring in Dstl8 when you want Möbius doing the proactive detection, cross-service correlation, and in-editor Q&A against your full log surface.

Your options

AWS Serverless Debugging: How Teams Actually Solve This.

Capability

Proactive incident detection

Unified view across Lambda instances and Fargate tasks

Cross-account, cross-region log surface

Pattern and anomaly detection across services

Root cause narrative with cited evidence

Query incidents from Claude Code / Cursor / terminal

CLOUDWATCH CONSOLE ONLY

watch manually

per log group, per stream

unsupported

manual

LOG DRAIN + GENERIC TOOL

rules required

export only

depends on setup

rules required

ControlTheory

Möbius, always on

one service view

first-class

continuous

Möbius diagnosis

MCP-native

Common questions

AWS Serverless Log Analysis — Questions from Engineering Teams.

How does Möbius decide what counts as an incident?

Möbius doesn’t wade through raw CloudWatch output. Dstl8 distills your logs first — clustering by pattern, scoring sentiment, tracking severity and volume over time, and flagging anomalies against each service’s normal baseline. Möbius reads that distilled signal and groups related anomalies into incidents, ranked by impact. You see a short list of things that actually matter, not a feed of every error spike.

Do I need to deploy a Lambda layer to use Dstl8?

No. Dstl8 pulls logs from your existing CloudWatch log groups via the AWS API, so Lambda and Fargate workloads work on day one with zero code or deployment changes. If you want to skip CloudWatch entirely for a workload, add an OTel log exporter to your app code and point it at Dstl8. Still no Lambda layer.

How does this work with ADOT and OpenTelemetry?

Dstl8 accepts OTLP logs as a first-class input, so anything emitting OpenTelemetry logs can ship directly. You don’t need ADOT for Dstl8 to work. If you already run ADOT for traces, keep it — point the log signal at Dstl8 and leave your tracing setup alone. The common pain with ADOT (cold-start tax from the collector extension) only applies when you run the full layer; the OTel log exporter in your app code doesn’t carry that overhead.

Can I ingest logs from multiple AWS accounts and regions?

Yes. Dstl8 treats cross-account and cross-region as normal ingestion, not a special case. This is the structural limit you hit in CloudWatch Insights, which only queries within a single account and region. In Dstl8, logs from your prod account and your analytics account land in the same workspace.

Does this replace CloudWatch, or sit alongside it?

Alongside. CloudWatch keeps doing what it does for AWS — it’s the default destination for Lambda and Fargate logs, and AWS services assume it’s there. Dstl8 reads from it. You don’t rip anything out. For workloads where you want to skip CloudWatch ingestion cost, route those logs via OTLP to Dstl8 directly. Mix and match per service.

Will Dstl8 pulling from the CloudWatch API add to my AWS bill?

Some, honestly. Reading logs out of CloudWatch uses the standard CloudWatch Logs API and incurs AWS’s normal API charges. Two things keep this small in practice: Dstl8 polls efficiently with rate-aware cadence, and once logs are in Dstl8 your team stops running repeated Insights queries (which bill per GB scanned). Most teams come out ahead on total spend because the Insights scan cost drops faster than the API cost rises. For workloads where you want zero CloudWatch dependency, use the OTel path and skip the API entirely.

How do I query incidents from Claude Code or Cursor?

Dstl8 exposes an MCP server. Connect it to Claude Code, Cursor, or any MCP-capable client, and you can ask questions against your live production log surface from inside your editor — “what broke in checkout-flow in the last hour”, “show me the error pattern behind the last Fargate restart”, “what was the root cause of incident 1423”. The answer is grounded in your actual logs, not a guess.

Get started

Start With Gonzo in Under 2 Minutes.

Open source terminal UI. No account, no agent, no configuration. Tail CloudWatch log groups and see what your Lambda is actually doing before you approve the next AI-generated patch.

Install Gonzo

Gonzo tails your log streams, surfaces patterns by severity, and sends individual entries to an LLM for explanation — all from your terminal. It’s the fastest way to bring real production signal into your AI SDLC workflow.

Homebrew

Go

Binary

Nix

Source

brew install gonzo

go install github.com/control-theory/gonzo/cmd/gonzo@latest

# Download the latest release for your platform from the releases page: # github.com/control-theory/gonzo/releases

nix run github:control-theory/gonzo

git clone https://github.com/control-theory/gonzo.git cd gonzo make build

Usage examples:

# Tail a CloudWatch log group:

aws logs tail “/aws/lambda/checkout-api” –follow –format json | gonzo

ull walkthrough including multi-group tailing and Live Tail: docs.controltheory.com — Gonzo + AWS CloudWatch

Stop Grepping CloudWatch. Let Möbius Find It.

Stream logs with Gonzo. Let Möbius detect, correlate, and diagnose with Dstl8. Trace Lambda errors, Fargate failures, and API Gateway 5xxs across services in one place, or ask your AI agent about them from Claude Code.

More for AI code generation reliability.

Blog

Gonzo + AWS CloudWatch: A Better Way to Tail Logs From Every AWS Service

A Tale of Two Log Types: Correlating Application and Infrastructure Events

AI-Generated Code Runtime Errors: Surface What’s Breaking in Your Stack

Cursor AI Run With Confidence: Debug What Cursor Ships, Fast

Gonzo Pro Tips & Roadmap: 30 Minutes With the Maintainers

You Can Vibe Code. You Can’t Vibe Debug.

Stop Grepping CloudWatch.
Let Möbius Find It.

Stream logs with Gonzo. Let Möbius detect, correlate, and diagnose with Dstl8. Trace Lambda errors, Fargate failures, and API Gateway 5xxs across services in one place — or ask your AI agent about them from Claude Code.

Request Dstl8 Demo