Why Every SRE Team Needs an LLM-Powered Blameless Postmortem Bot

AI Reliability

I’ve written hundreds of postmortems in my career. They all follow the same pattern:

Timeline
Root cause
What went well
What could have been better
Action items

After the 100th one, I wanted to check myself into an asylum (Not really, but it sure sounded good at the time). So I built a tool that eats the On-call scheduling timeline (Pagerduty) + Jira tickets + Datadog metrics + Slack thread and spits out a 90%-complete postmortem. Then the human just edits the juicy bits and can spend time on the post-mortem discussion, and not the monotonous preparations.

I rolled it out and immediately saw the benefits. The result?

Postmortem prep went from taking 4–8 hours → ~45 minutes
Quality actually went up because the bot is brutally honest
People started writing “what went well” sections without me having to nag them

The prompt is gloriously savage: “You are a deeply cynical but fair SRE principal who has seen every possible failure mode. Write this postmortem in the style of a Netflix tech blog but with zero corporate fluff.” Guided by this prompt and some for structured output the post-mortems maintained a consistent structure and robust framework.

I plan to open-source the whole thing in 2026, and perhaps others will find it beneficial.

Until then, steal this idea. Your future self will thank you when the incident is over and you dont have to spend hours writing the preparation material.

Tags:Automation continuous-improvement Post-mortems

Leave a Reply Cancel reply