See How to Optimize Datadog - Control Cost, Cardinality, Traces, APM Read More

Videos

Home/Videos

Logs to S3 and Rehydration

Clock white 3m 43s
May 22, 2025
By Jon Reeve

Welcome to this 2nd installment of “OTel in a minute” (or so) – short bites of OpenTelemetry that you can use. The reality is that many of us are sending a lot of telemetry (logs, metrics, traces) to our observability backends & vendors – and given we’re paying for the ingest, storage or indexing of all that data, observability bills can get out of hand pretty quickly.

And then there’s a class of information we need to collect and retain “just because” think things like logs for compliance or audit purposes, but do we need to be paying top dollar to store all of those in our observability vendor, all the time?

What if we had way to send a copy of our telemetry to low cost “cold storage” like AWS S3 [and from there maybe other “colder” tiers like Glacier], and send the important stuff or “high signal” information to our observability vendor to help us both reduce cost and noise there.

But you know our auditor might need to see that data at audit time, so it would be great if we had a way to get it back or “rehydrate” selected data into our observability tool of choice in we need to.

Enter the OpenTelemetry collector – let’s take a look at how we can set up a pipeline to send a copy of our logs to S3, and then how we can rehydrate them as needed. We can do this with metrics and traces as well, but we’ll focus on logs here as they are low hanging fruit for many folks, as a recent CTO said to us “almost 100% of logs go unread”

In this setup, we have 3 existing pipelines flowing our logs, metrics, and traces through to our observability backend of choice – let’s add another pipeline for our logs to send a copy of all of them to AWS S3.

To do this we’re going to use the AWS S3 exporter as a destination – an exporter available in the contrib collector repo- it basically takes your telemetry and adds it to an S3 bucket in a region of your choice, and puts it in 1 minute folders by default. For authentication to AWS, we’re running our collector as a Docker container, and we’re going to inject the auth as environment variables here – you’ll want to use something like Vault for production, or you can also run the AWS CLI where you run your collector. We’re going to send the data in default format of OTLP JSON.

Awesome, let’s take a look at our AWS console – we can see our logs being placed into our specified bucket with the default 1 min folders.

Great, we can feel free to heavily optimize our logs going to our observability vendor with confidence, knowing we’re retaining a copy affordably in S3.

But what if we do get that call from our auditor, and we need to get some logs back or “rehydrate” them?

Well AWS S3 can also act as a source or “receiver” for our OTel collector for data we already have stored there – there is a receiver in contrib called “AWSS3Receiver” – we need to specify an S3 bucket and prefix like before, and a “starttime” and “endtime” for the data we want to retrieve out of the bucket, this time in our receiver configuration – and hey look, we needed to add another pipeline to get the data out of S3 – we starting to see how we need to treat collector configurations as ephemeral, and the need for something to help us change and orchestrate these for us.

Let’s take a look – we used a trick from our last video and set up the remote tap processor to see the telemetry being grabbed back out of S3 [show screen with telemetry coming out of S3] before going on to our observability backend of choice.

Well that’s todays’ OTel in a minute (or so) – we’ve seen how it is possible to have our telemetry cake and eat it to – we can reduce our observability vendor data volumes & increase signal to noise by sending a copy of our telemetry to AWS S3, and by being able to get it back if we need to! Until next time.