Maturity Levels
Every sub-domain in the SRF is assessed across three axes — Detection, Mitigations, and Response — and each axis is scored on a four-level maturity scale from 0 to 3. This page documents what each level means, how the score is calculated from capability checklists, how axis scores roll up into domain scores and an overall grade, and where the empirical data shows the highest return on improvement investment.
The four maturity levels
The level names below are verbatim from the framework record.
| Level | Name | What it means | Recognisable signals |
|---|---|---|---|
| 0 | None — No capability exists | The organisation has not addressed this area. | No monitoring, no documented procedure, no designated owner. The team learns of failures from customer reports. |
| 1 | Ad-hoc — Reactive, person-dependent | Capability exists but is informal and relies on specific individuals. | The senior engineer knows what to check. There is no runbook. If that person is unavailable, response time doubles. |
| 2 | Defined — Documented processes, manual thresholds | Documented processes and thresholds exist but require human judgment to trigger and may not be consistently followed. | A runbook exists. A monitoring dashboard exists. Someone has written down the escalation path. Alerting requires a human to notice the dashboard. |
| 3 | Managed — Automated, consistent, measurable | Controls operate without human initiation. Effectiveness is tracked through metrics. | Alerts fire automatically. Circuit breakers engage without human intervention. Deployments roll back on canary failure. Reliability is a system property rather than a human performance issue. |
Most organisations in the dataset operate at Level 1 across most dimensions. Most believe they are at Level 2. The distinction matters: documented processes that are not consistently followed or not automatically triggered are Level 2 at best, and the assessment is calibrated to reflect that.
How axis scores are computed
Capabilities and checklists
Each axis (Detection, Mitigations, Response) within a sub-domain contains a set of specific capability statements. For each axis, the assessor checks which capabilities are currently in place. The number of capabilities per axis varies by sub-domain — typically three to seven.
Prerequisite enforcement
Prerequisites are enforced before scoring. A capability only counts toward the score if all of its declared prerequisites are also checked. This reflects the logical dependency structure of controls: automated alerting cannot be meaningfully claimed if no monitoring exists to alert on; circuit breakers cannot be meaningfully claimed if there is no dependency identification in place.
Prerequisites are resolved iteratively across chains. If capability C depends on capability B, which depends on capability A, then checking C without A removes C from the effective set even if B is checked. The scoring operates on the effective checked set — the capabilities that remain after prerequisite enforcement — not the raw checked set.
Score formula
Axis scoring is level-contiguous, not proportional. A level L is achieved only if every in-scope capability tagged at level L is in the effective-checked set. The axis score is the highest L for which levels 1..L are all fully satisfied:
axisLevel = max L in [1..3] such that every capability at levels 1..L
is present in the effective-checked set;
0 if level 1 is not fully satisfied.
This produces a strict ladder. Checking a Level 3 capability without Level 1 and Level 2 in place does not advance the score — the axis remains at whichever prior level is fully satisfied. The behaviour reflects the intent of the maturity model: Level 3 describes automated, measurable controls that presume a baseline of Level 1 visibility and Level 2 documented process, and crediting Level 3 without those foundations would misrepresent actual maturity. Levels with no in-scope capabilities are vacuously satisfied and do not block advancement.
Capability prerequisites are applied before level evaluation. A capability only counts toward its level if its declared prerequisites are also in the effective-checked set; prerequisite resolution is iterative across chains.
A few boundary cases:
- An axis where capabilities are checked but not all Level 1 capabilities are satisfied scores Level 0 — the foundation is incomplete.
- An axis where no capabilities are checked (but the assessor interacted with it) also scores Level 0 — the organisation has the sub-domain in scope but has none of the capabilities in place.
- An axis that is left unanswered entirely scores null and is excluded from domain and overall averages. This is appropriate for sub-domains that are genuinely out of scope.
Domain and overall scoring
Domain score
A domain score is the average of all answered axis scores across all sub-domains in the domain. If an axis is null (unanswered), it is excluded from the average. The domain score is expressed as a percentage of the maximum possible (3.0), producing a domain percentage between 0% and 100%.
Overall score
The overall score is the average of all answered axis scores across all domains. Domain weighting by incident frequency is reflected in the assessment structure (more sub-domains in higher-frequency domains) rather than in post-hoc weighting of the average.
Letter grade
The overall percentage maps to a letter grade:
| Grade | Threshold | Interpretation |
|---|---|---|
| A | ≥ 87% | Strong controls across the framework; most failure modes are addressed at Level 3 |
| B | ≥ 75% | Good overall posture with identifiable gaps; most high-frequency domains are well-controlled |
| C | ≥ 60% | Adequate detection and response with material Mitigations gaps; the median organisation in the dataset |
| D | ≥ 45% | Significant gaps across multiple domains; material downtime risk |
| F | < 45% | Critical gaps; the organisation is operating with minimal reliability management in place |
Grade C corresponds approximately to the dataset median, given observed average scores of Detection 1.8, Mitigations 1.1, and Response 1.6.
The Mitigations gap
The most important finding from the 192-incident dataset is not which domain fails most often — it is how different the outcomes are depending on the Mitigations axis score.
Average axis scores across the dataset:
| Axis | Average score | Average percentage |
|---|---|---|
| Detection | 1.8 / 3.0 | 60% |
| Mitigations | 1.1 / 3.0 | 37% |
| Response | 1.6 / 3.0 | 53% |
Mitigations is consistently the weakest axis. The gap between Detection and Mitigations — 0.7 points on a 3-point scale — is the single most important finding in the dataset. Organisations are investing heavily in watching things break (Detection) and in recovering from breakage (Response), but underinvesting in the architectural controls that would prevent many incidents from reaching production impact in the first place.
The consequences are quantifiable. Incidents in organisations with Mitigations at Level 3 or above averaged 2.1 hours in duration. Incidents in organisations with Mitigations at Level 0–1 averaged 8.7 hours. This is not simply because well-managed organisations are better at everything — it is specifically because architectural mitigations (circuit breakers, redundancy, rate limiting, connection pool management, canary deployments) absorb failures that would otherwise become outages. More importantly, the strong-Mitigations organisations had fewer incidents reaching production impact at all: their architectural protections converted what would have been outages into contained events.
The jump from Level 1 to Level 3 on Mitigations is the single highest-ROI reliability improvement available to most organisations. Level 1 means the architecture has minimal protection — most failures become catastrophic incidents. Level 3 means the architecture absorbs most failures automatically — only the exceptional cases become incidents, and they are shorter when they do.
D-M-R profiles, not point scores
The overall grade is useful for communication — a CEO can understand "we are Grade C and need to reach Grade B" in a way they cannot understand "our weighted D-M-R average is 1.47." But the overall grade obscures the information that actually drives remediation decisions.
Two organisations can share an identical overall grade while needing completely different investment priorities:
-
An organisation with a 3-1-1 profile (Detection 3, Mitigations 1, Response 1) has strong monitoring but fragile architecture and slow recovery. Additional monitoring investment yields negligible reliability improvement. What it needs is architectural redesign: redundancy, circuit breakers, connection pool management, automated failover.
-
An organisation with a 1-3-1 profile (Detection 1, Mitigations 3, Response 1) has robust architectural protections but limited visibility. It may be avoiding many incidents through good design, but when something does break, the team is blind and slow. It needs monitoring, alerting, and observability investment.
-
An organisation with a 1-1-3 profile (Detection 1, Mitigations 1, Response 3) can recover from anything but cannot prevent or contain failures before they reach users. Every incident is a managed fire drill. It needs preventive investment across both Detection and Mitigations.
The same point score, three different remediation strategies. This is why the SRF reports profiles rather than single numbers, and why the Export Backlog output surfaces the specific capability gaps behind each axis score — so that remediation work addresses the actual weaknesses, not the overall average.