The emergence of AI SRE as a product category name sets a dangerous precedent – especially for such an early market. By quickly leaping out of the gate as the accepted search term, it has created a snowball effect for marketers and thus a forcing function for everyone rushing into this market. Including the whole ecosystem from the monolithic big observability dogs to young, emerging startups alike.
It’s a Floor Wax – No, a Dessert Topping
As Sebastian Vietz from SRE Insights so eloquently pointed out in his SABRO Framework blog, “SRE is a discipline — not a SKU.”
So what’s an “AI SRE”?
- An autonomous replacement for the human SRE?
- A teammate, assistant, co-pilot, or buddy?
- Or—like the classic SNL sketch—both a floor wax and a dessert topping?
AIOps – so 2010s?
Gartner coined the term AIOps – Artificial Intelligence for IT Operations – back in 2016. Many IT teams and larger O11y vendors still use the term, but it has conveniently disappeared from the marketing lexicon of most “AI SRE” startups.
But…the Gartner definition does give one pause:
“AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.” “AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.”
This does sound suspiciously like the architectural approach of many “AI SRE” tools who mine large, existing observability data lakes and data sources, applying newer agentic and LLM technologies directed at a newer, more modern role, the SRE. On the other hand, there are only so many words to describe the same or maybe slightly different things.
Observability Engineering – a Missing Link
I fear the focus on the SRE role overemphasizes the “R” or reliability part of their name. Reliability is an outcome that relies on a variety of supporting disciplines or depending on how broadly one defines “SRE” perhaps subdisciplines.
Honeycomb – who helped write the actual book on Observability Engineering – defines the observability engineer role as “team members charged with building data pipelines, monitoring, working with time series data, and maybe even distributed tracing and security.”
Observability Engineering is an essential part of an SREs role, discipline, or an SRE team, but overlooked today when the “R” leads us to focus on the outcomes – correlation, causality, and root cause – versus how to get to the answers. Observability Engineering will be a critical component of any AI designed for SREs – whether that’s augmentation or replacement.
More Iron Man Suits
Andrej Karpathy brings a wealth of wisdom to this discussion from his time at OpenAI and as Sr. Director of AI at Tesla, where he led the computer vision team of Tesla Autopilot. He advocates the notion of partial autonomy with the analogy of more “Iron Man suits” and less robots or flashy autonomous agent demos.
If you are enabling SREs, an Iron Man suit is an empowering concept. Combined with an “autonomy slider” – taken from Andrej’s Tesla autopilot experience and clearly a part of the Iron Man suit capabilities – this leaves the SRE in charge of how much or how little autonomy to engage.
AI for SREs
Back to the original question of “AI SRE” or “AI for SREs” – the more accurate and empowering vision for this product category is “AI for SREs.” It is close enough to Sebastian Vietz’s “AI Tooling for SREs” solution, and I think both work great for me. At least for now!
A previous post focused on “AI and a TUI: Practical Logging Tools for SREs.” I also believe we should all focus on those qualities, too: practical, tools, and who they are for – the SRE.
The discussions and debates will undoubtedly continue as the market develops and I welcome feedback.
press@controltheory.com