How your operation actually shows up. Across humans and AI.
Perform is the function nobody names but everyone feels. It's the difference between an operation that closes tickets and an operation that builds trust. The bar that defines what "good" looks like, every day, on every conversation. Human or AI.
v 02 · live
What Perform means.
Perform is the work of showing up consistently across an operation. Not the work of writing scripts. Not the work of monitoring dashboards. The work of agreeing what a good conversation sounds like, calibrating to that bar every week, and scoring every interaction against it. The team's interactions and the AI's, against one standard.
Most CX teams have a vague sense of "good." Senior agents carry it in their heads. The AI runs on a system prompt nobody calibrated. New hires absorb a third version through osmosis. Quality scoring catches the worst cases on the human side; the AI's vendor dashboard reports against its own metrics. Nobody can answer the question "what does great look like, today, on this customer's situation?" consistently across both populations.
That's the gap Perform names. The bar isn't written down. The standard isn't shared. The AI's behavior is calibrated by a vendor CSM the team has never met. The calibration cadence is monthly at best, and one-sided when it happens. Agents fly on instinct. Managers grade on instinct. The AI drifts on whatever the prompt last said. Quality drifts everywhere.
Haven's Perform module builds the shared standard first. Three to five named dimensions. Four named levels per dimension. Every interaction scored against it, human and AI alike, in real time. The bar becomes legible. New hires onboard against it. The AI is built on it. Senior agents teach against it. Quality stops drifting on either side.
The work isn't fancy. It's a craft skill that's been buried under "QA software" for ten years and split across two stacks for the last two. Haven names it, structures it, and holds it across both populations. That's the function.
The progression. Four levels.
"Good" lives in senior agents' heads, and in whatever the AI's system prompt last said. Quality is monitored after the fact on the human side. The AI's behavior is read by a vendor dashboard nobody verifies. New hires learn through ride-alongs. Calibration is monthly or quarterly, one-sided, often skipped.
- No written standard
- No shared definition of good
- QA scoring exists but isn't trusted
- AI runs on whatever the system prompt last said
A bar exists for humans, but it isn't shared. The AI is on its own. The lead has the bar. Some senior agents have it. New hires don't. The AI is calibrated by a vendor CSM the team has never met. Quality scoring catches obvious misses on the human side but doesn't read the AI at all.
- Lead carries the bar
- Coaching happens 1:1 on humans only
- AI calibration owned by the vendor
- Humans and AI drift in different directions; nobody reads them together
The bar is named, owned, and calibrated weekly across humans and AI. A shared standard. Three to five dimensions. Four levels each. Every interaction scored against it, whether the team handled it or the AI did. Every operator knows what good looks like, today.
- Written standard, version-controlled
- Weekly calibration across humans and AI
- Shared with the team and the AI's prompt owner
- Owner named
The standard evolves with the work and updates both populations. Calibration findings update the standard. The standard trains new hires and updates the AI's prompt structure. Quality drift on either side is detected before it shows up in CSAT.
- Self-improving standard
- Auto-onboarding from standard (human and AI)
- Drift detection on both populations
- Drift caught at the standard line, not at CSAT
What Perform builds.
The shared standard
Three to five dimensions. Four levels per dimension. Calibrated weekly across humans and AI both. The single most leveraged artifact in the function.
- 3–5 dimensions, 4 levels each
- Calibrated examples per level
- Scored against every interaction, human and AI
- Linked to onboarding & Enable
The calibration ritual
A weekly 30-minute session where the team scores the same five conversations. Mix of human-handled and AI-handled. Findings update the standard and route to prompt changes where the AI is the source.
- 5 conversations, mix of human and AI
- Findings update the standard live
- Disagreement log → coaching or prompt updates
- Whole team participates; prompt owner included
The onboarding ladder
Six-week ramp where new hires move from Level 01 to Level 02 against the standard, with named milestones. Reduces ramp time from 12 weeks to 6.
- Six-week ramp, Level 01 → 02
- Named milestones every two weeks
- Live standard scoring against cohort
- Cuts ramp from 12 weeks to 6
See it cascade.
A Perform signal rarely stops at Perform. A QA flag on the team often traces to a knowledge gap the AI is hitting on the same intent, and the fix routes back through Enable. One root cause, not two performance conversations. See how Perform cascades across the operation →