// How Haven thinks

Methodology constrains. LLM reasons. Data validates.

Haven's intelligence has three parts, and they each do a different job. A model of how customer operations behave. A language model that reads and reasons inside that model. Your data, calibrating both as the operation runs.

01 · Methodology

The graph of how operations behave.

A directed graph of about 50 structural relationships across the seven functions. Each one has a prior probability of activation, a lag estimate with an uncertainty range, and an expected effect size downstream. The graph encodes things like: when knowledge goes stale, agent agreement drops before AHT moves. When volume crosses a threshold, CSAT shifts before the team feels it. The relationships are explicit and inspectable. You can argue with any of them.

02 · LLM

Reasoning inside the constraints.

The language model does the writing. It does not invent the patterns it writes about. The methodology graph has those already. For each interaction the LLM reads the data, walks the graph, and drafts whatever fits the situation: coaching for an agent, an update for a macro, a flag for calibration.

03 · Data

Your operation, validating the model.

Methodology priors are where Haven starts. The longer it runs on your data, the more it learns about you specifically. Generic baselines get replaced by yours. Default thresholds get replaced by yours. Lag estimates tighten as the variance shrinks. By the time you have a few hundred closed cases, the graph reflects your operation, not a generic starting point.

// A worked example · One signal · Ten days

A returns policy update. Five days to surface in CSAT.

One operational signal, traced through the methodology graph from start to finish. Each step uses a relationship the graph already knows. The reading is continuous. By the time the leader looks at the dashboard, the intervention is usually already drafted and waiting.

Return policy update at a Tier-1 SaaS operation 35 agents · 6 AI clusters · Reactive on Enable Live trace

T+0 Product

Returns policy ships. Knowledge base not updated.

No detection yet. Haven is reading every interaction against the standard. The standard has not caught up to the policy.

T+24h Build

Three agents handle the same returns scenario three different ways. Agreement variance climbs.

Methodology: Build → Perform propagation, prior probability of activation 0.81. Expected lag 1 to 3 days. Expected CSAT effect downstream: 2 to 5 points within seven days.

T+72h Perform

Average handle time on returns climbs 18 percent. Two AI clusters start escalating returns to humans at twice baseline rate.

Methodology: AHT shift plus AI escalation rate climbing on the same scenario. Posterior on CSAT degradation within seven days updates from 0.81 (prior) to 0.94 (Bayes factor approximately 3.7 from the combined evidence). Recommended action threshold crossed.

T+5d Measure

CSAT on returns degrades 4 points. Customers cite confusion about timelines.

Predicted on day 1. Confirmed on day 5. The system was already drafting the intervention.

T+5d Improve

Coaching draft for the team. Macro update for AI cluster #4847. Returns rubric flagged for rewrite.

Three drafts queued, pending the leader's call. Each one references the methodology trace and the data that triggered it. None of them deploy without approval.

T+10d Enable

Returns rubric rewrite goes live. Macro updated. AI clusters retrained on the new standard.

Methodology: standard rewrites recover CSAT within 7 to 10 days at this maturity stage. Prior 0.74. Lag estimate refined using 14 prior cases of this pattern shape across operations of similar maturity (cross-account pooling): expected recovery 4 to 6 days, 80 percent posterior interval.

T+14d Measure

CSAT on returns recovers. Variance closes. Loop logged for the next prior update.

Outcome captured for the validation loop. Predicted recovery: 4 to 6 days. Actual: 4 days. The labeled observation joins the dataset that recalibrates the methodology graph. A single observation does not move the prior much. Across hundreds of closed cases, the priors update measurably and the graph becomes calibrated to your operation.

What just happened. Haven saw the chain on day one. A QA tool wouldn't have flagged it until CSAT actually dropped on day five. A consultant would have spotted it next quarter, by which point the churn would already be on the books.

// The math of it

Why this works on a small team's data.

Most analytics tools fail on small teams because there isn't enough data to separate signal from coincidence. Haven works the other way around. It starts with what the methodology already knows is structurally true, then lets the data argue with it.

35 Operational metrics

595 Naive pairs to test

~50 Methodology relationships

12× Hypothesis-space reduction

35 operational metrics means 595 possible pairs to compare. Most of those pairs are meaningless. A team of five agents will not generate enough events for the signal-to-noise ratio to work at that scale. The system finds correlations that aren't actually there. That's why off-the-shelf analytics tools almost never produce useful results for small support teams, no matter how clean the dashboard looks.

Methodology constrains the search. Of those 595 possible pairs, only about 50 are structurally meaningful, and Haven knows which ones. It starts with informative priors over those 50 instead of flat priors over 595. That isn't just fewer hypotheses to test. It also means the system has actual beliefs about what is likely to happen, so it can act on weak evidence rather than waiting for statistical significance. A flat-prior or null-hypothesis-testing approach would need hundreds of observations before it could justify any conclusion. Strong priors let useful posteriors emerge from a 5-agent team's volume.

As observations pile up, the posterior shifts from prior-dominated to data-dominated. Lag estimates tighten as variance shrinks. Thresholds calibrate to how your specific team behaves. After a few hundred closed cases, what you have is a calibrated model of how this particular operation behaves under stress.

Every prediction gets checked against what actually happened. A predicted CSAT recovery has an actual recovery time. A predicted variance closure has an actual closure date. Each closed case becomes a labeled observation, and the methodology graph re-calibrates against reality on a continuous basis. When predictions miss the same way repeatedly, the relevant prior gets updated. The size of that update is governed by a hierarchical model that pools information across operations at similar maturity, so a single weird outcome on one account does not whipsaw the parameters for everyone. The whole pipeline is inspectable. You can pull up any prediction Haven has made, what actually happened, and what changed in the model as a result.

// What the leader sees

Not an alert feed. A reasoning trace.

Most monitoring tools tell you something changed and stop there. Haven tells you what changed, why it matters, what is going to break next if you do nothing, and what to do about it. The whole chain of reasoning is on the screen, including the parts Haven isn't sure about.

Live read Readout · 2026-04-25 · 14:32 GMT

Detected

Agreement variance climbing on returns. Three agents, same scenario, different outcomes. AI cluster escalation rate up 2.1× on the same scenario.AGREEMENT_VAR = 0.34 (baseline 0.18) · Δ_24H = +0.18 · DETECT_CONF = 0.91

Methodology

Variance above threshold combined with Reactive maturity on Enable points to a rubric coverage gap. Expected propagation to Perform: 24 to 72 hours. Expected propagation to CSAT: 5 to 7 days. The graph shows this pattern surfaces in returns 14 times per year on average across operations of this size.

Recommended

Rewrite the returns rubric in the Enable builder. Push the macro update to AI cluster #4847. Schedule one coaching block with the three flagged agents.EST_CSAT_RECOVERY = 7 to 10 days · DRAFT_READY = YES

Prediction posterior

POSTERIOR = 0.78 · n=14 cross-account cases of this pattern shape

// What this is not

Not alerts. Not dashboards. Not chatbots.

Three categories of tool already sit in the buyer's stack. Each does part of the job. None of them treats the operation itself as something that learns.

Alert tools

Tell you something changed, and what threshold tripped. Don't connect signals to causes. Don't tell you what to do about it.

Dashboards

Show you metrics, broken out by function. Each tile is its own little world. The connections between tiles live in the leader's head, if anywhere. The operation moves and the dashboard does not move with it.

AI chatbots

Replace the agent. Optimise for deflection. The interaction never reaches the team, so the team never learns from it. The operation does not improve. The deflection rate goes up.

Haven

Reads every interaction against the standard. Connects signals through the methodology graph. Drafts the intervention before anyone has had to call a meeting about it. Validates each prediction against the actual recovery. Logs the closed case for the next round of updates.