It's 2 AM and you don't know what changed
The single question every on-call engineer asks first, and why the tools we already pay for don't answer it. The case for a small, change-aware incident-response tool.
It is 2:14 AM. Your phone is on the nightstand and it just lit the room. PagerDuty: api-checkout 5xx > 5%. You sit up, find your laptop, and open the first dashboard. The graph is red. The graph has been red for ninety seconds.
You know exactly one useful thing right now: something just changed. A deploy went out. A feature flag flipped. Someone pushed a config update. The graph turning red is downstream of one of those events, and you do not yet know which.
So you start opening tabs.
The browser-tab triage
GitHub, to scan the last twenty merged PRs. Slack #deploys, because that bot posts every push. The deploy tool, because the bot lags by a minute. The feature-flag dashboard, because Kenji flipped something at 8 PM and you cannot remember which thing. CloudWatch, because there was a Terraform apply this week. The on-call runbook in Notion, because you need to remember the rollback command for the checkout service.
Seven tabs. You are mentally diffing them against each other while a customer somewhere is failing to pay you. The graph is still red.
Twenty-seven minutes in, you find it: it was the deploy at 02:13:51 — PR #842, a four-line change to the payment retry logic. You roll it back. The graph cools off six minutes later. You write a thirty-line Slack thread, set up a calendar reminder for the post-mortem on Thursday, and you go back to bed at 03:42.
Thursday's post-mortem will have an action item that reads, verbatim: "Add better tooling to surface recent changes during incidents."
This is the universal scene
We have done it. Everyone reading this has done it. Some of you did it this week. The shape is the same at every company you have worked at, with every alerting stack, in every cloud:
- An alert fires.
- You assume something changed.
- You go on a manual hunt across seven tools to figure out which.
- You guess. You roll something back. You hope.
- Eventually it works. You go to bed exhausted.
- The post-mortem promises better tooling that never gets built.
At big companies there is usually a team for this — an SRE org or an incident-commander rotation that owns the tooling. At small companies there is not. The on-call engineer is the entire incident-response function, and the tooling they get is the tooling they paid for: a paging service, a metrics dashboard, and the patience to alt-tab between them.
Why the tools you have do not answer "what changed?"
Datadog tells you what is broken right now. PagerDuty tells you that someone got paged. GitHub tells you what your team merged this week. None of them — and this is the load-bearing observation — none of them tell you, at the moment you ask, which of the things you recently shipped is most likely the cause of the thing that is broken.
That correlation does not exist in your stack today. It exists in your head, and your head is not at its best at 2 AM.
What a tool for the 2 AM moment looks like
We have been building one. It is opinionated:
- The on-call engineer is the only customer. Not the VP of Engineering, not the dashboard committee, not procurement. If a feature does not help someone at 2 AM, half-asleep, sometimes on their phone, it does not ship.
- Five minutes to value. Point your alerting tool's webhook at a Regle URL and send your deploys through the same hook (or the API). If you cannot see something useful in five minutes, we have failed.
- Rank, do not list. Showing you all recent changes is a worse version of what you already get from Slack. Ranking the ones most likely to have caused this specific alert — by service overlap, timing, and historical correlation — is the actual job.
- AI invisible. We use LLMs to rank likely causes and draft post-mortems from the captured timeline. We do not put a chat sidebar in your face. If you have to type a question, we failed.
The smallest possible promise
We are not promising to replace Datadog. We are not promising to replace PagerDuty. We are not building a "unified observability platform." The pitch is much smaller and much more honest:
The next time you get paged at 2 AM, Regle answers "what changed?" in seconds, so you can act on the right thing instead of guessing across seven tabs.
If you are an on-call engineer at a 10–80 engineer company who has lived through this exact 2 AM, you already know whether you want this. We would love to have you in our design-partner cohort — free for six months, direct Slack to both founders, the only catch is that you tell us honestly when we get it wrong.