Tech Due Diligence OPS-2: Incident Management and Response
What This Control Requires
The assessor evaluates the incident management process, including incident detection speed, response procedures, communication practices, post-incident reviews, and the overall maturity of the incident response capability.
In Plain Language
How a team handles things going wrong in production reveals more about its operational maturity than almost anything else. Incident management covers detecting, responding to, and recovering from production issues while minimising customer impact and generating lessons for next time.
Assessors look at detection mechanisms (alerts, user reports, health checks), response procedures (who does what when an incident is declared), communication practices (status pages, customer notifications, internal coordination), severity classification and escalation, mean time to detect (MTTD) and mean time to resolve (MTTR), post-incident review practices, and trends in incident frequency and severity.
Post-incident reviews get particular scrutiny. Teams that run blameless retrospectives and actually follow through on improvement actions show a learning culture that reduces recurrence over time. Teams that fight fires but never analyse them are almost guaranteed to keep making the same mistakes.
How to Implement
Map out a clear incident management process covering each phase: detection (how incidents are spotted - alerts, user reports, synthetic monitoring), declaration (criteria and severity levels), response (defined roles - incident commander, comms lead, technical lead), communication (internal notifications, status page, customer updates), resolution (troubleshooting, mitigation, recovery), and review (post-incident analysis and improvement actions).
Define severity levels with clear criteria. SEV-1 (Critical): complete outage or data loss affecting all users. SEV-2 (High): major feature degraded with significant user impact. SEV-3 (Medium): minor feature issue, limited impact. SEV-4 (Low): cosmetic or minor, minimal impact. Set response time expectations and escalation paths for each level.
Set up a public status page (Statuspage.io, Cachet, or similar) for communicating with customers during incidents. Update it promptly when something goes wrong and throughout resolution. Transparency during incidents builds far more trust than silence.
Run blameless post-incident reviews for all SEV-1 and SEV-2 incidents, and optionally for interesting SEV-3s. Each review should produce a timeline from detection to resolution, root cause analysis, contributing factors, action items with owners and deadlines, and a written report shared with the team.
Track incident metrics: count per month by severity, MTTD, MTTR, recurrence rate for the same root cause, and action item completion rate from reviews. Without these numbers, you cannot demonstrate improvement.
Run incident response drills periodically. Simulate realistic scenarios to test your detection, response, and communication procedures. Drills expose process and tooling gaps that stay hidden until you exercise them under pressure.
Evidence Your Auditor Will Request
- Incident management process documentation with severity levels
- Post-incident review reports for recent incidents
- Incident metrics (MTTD, MTTR, frequency, severity distribution)
- Status page or customer communication channel for incidents
- Action item tracking from post-incident reviews
Common Mistakes
- No defined incident process; response is chaotic and depends on who is available
- Post-incident reviews not conducted or conducted but with no follow-up on actions
- No customer communication during incidents; customers discover issues themselves
- Incident metrics not tracked; no visibility into trends or improvement
- Same root causes repeatedly cause incidents; lessons are not learned
Related Controls Across Frameworks
Frequently Asked Questions
What MTTR is expected by assessors?
Should post-incident reviews be blameless?
Track Tech Due Diligence compliance in one place
AuditFront helps you manage every Tech Due Diligence control, collect evidence, and stay audit-ready.
Start Free Assessment