All posts
Product May 2026·7 min read

Anatomy of an alert worth waking up for

Alert fatigue is not caused by too many alerts. It's caused by too many useless alerts — ones that state a fact without telling you whether to care, why it's happening, or what to do. After enough of those, a human learns to swipe the notification away without reading it. The most dangerous monitoring failure isn't a missing alert; it's a real one that arrives looking exactly like the noise you've been trained to ignore.

So the question worth obsessing over isn't "what should we alert on?" It's "what does a single alert need to contain for a tired engineer to act on it correctly in thirty seconds?" In our experience there are four parts, and most tooling ships only the first.

1. Symptom — what is happening, with numbers

The observable fact, quantified. Not "Redis memory high" but "used_memory on cache-02 is climbing 42 MB/min against a 4 GB maxmemory." The symptom anchors the alert to reality and lets the reader sanity-check it instantly. Vague symptoms are where trust dies: an engineer who's been burned by "high" or "elevated" with no number learns to distrust the whole channel.

2. Cause — the likely why, correlated across signals

This is the part almost nobody automates, because it requires looking at more than one metric. A good cause statement connects the symptom to its probable driver: memory is climbing and maxmemory-policy is noeviction and key count in db0 is growing linearly. Any one of those alone is trivia. Together they're a diagnosis. The whole reason dashboards feel exhausting is that they make the human do this join, manually, across a dozen panels, every single time.

3. Blast radius — who and what this takes down

Severity is not a property of a metric; it's a property of consequence. "Heap at 80%" is a number. "When this node ages out in ~25 minutes, primary shards for the orders index go unassigned and writes stall" is a reason to stand up. Blast radius is what lets an on-call engineer triage correctly: escalate the thing that takes down checkout, defer the thing that degrades a nightly batch job. Without it, every alert competes for attention as if equally urgent, which is its own kind of noise.

4. Fix — the specific next action

The single highest-leverage thing an alert can include is the command. Not "investigate memory usage" — that's a homework assignment. Instead: set an eviction policy with CONFIG SET maxmemory-policy allkeys-lru, or raise the ceiling, then investigate the key growth in db0. Even when the fix is provisional, naming a concrete first step collapses the time between reading the alert and changing the system — which, during an incident, is the only clock that matters.

Why "more data" made this worse, not better

The industry's reflex for the last decade was to add more metrics, more dashboards, more retention. But data and decisions are different products. A 60-panel dashboard is a magnificent way to explore a problem you already know you have, and a terrible way to be told you have one. Adding panels increases the surface area a human must scan to reach a conclusion. Past a point, more observability data actively lengthens time-to-understanding, because the signal is now buried in more places.

The fix is not less data — keep all of it for when you need to dig. The fix is a layer that does the reading for you and emits conclusions. The dashboard becomes the thing you open to verify the conclusion, not the thing you parse to reach one.

A quick test for any alert you own

Pull up your three noisiest alerts and ask of each: does it state a quantified symptom, a correlated cause, a concrete blast radius, and a specific fix? Most will have one of the four. The ones your team actually trusts — the ones nobody mutes — will have all four. That's not a coincidence. It's the whole difference between a notification and an insight.

Foreseer is built around those four parts on purpose: every insight leads with the symptom and numbers, states the likely cause from correlated signals, estimates the blast radius, and hands you the command — then auto-resolves when the condition clears, so the channel stays trustworthy. An alert channel is only as valuable as it is believed.

See it on your own infrastructure

One line to install. Your first insight lands within minutes.

Back to home

Talk to us

Questions about the product, Enterprise, or self-hosting? We read every message.

Send a message Use the contact form Email us hello@foreseer.app