Tech Due Diligence OPS-4: Uptime and SLA Management
What This Control Requires
The assessor evaluates the historical uptime track record, SLA commitments and compliance, uptime monitoring mechanisms, and the engineering practices that contribute to service reliability.
In Plain Language
For B2B SaaS, uptime is not just a technical metric - it is a commercial promise. Customers expect at least 99.9% availability (roughly 8.7 hours of downtime per year), and enterprise customers often demand 99.95% or higher. Assessors want to know whether your infrastructure and practices can sustainably deliver the reliability your market expects.
They will review historical uptime data (monthly and annual), your SLA or SLO targets and whether you are actually hitting them, the number and severity of production incidents over the past 12 months, the biggest outages and their root causes, whether uptime is verified by independent external monitoring, and architectural features supporting high availability like redundancy, failover, and load balancing.
Reliability is a direct measure of operational quality. A solid uptime track record backed by genuine engineering practices gives investors confidence. A history of repeated outages with the same root causes does the opposite.
How to Implement
Define and publish SLA or SLO targets that fit your market. Common SaaS benchmarks: 99.9% (roughly 8.7 hours downtime per year) for standard tiers, 99.95% (roughly 4.4 hours) for enterprise tiers, 99.99% (roughly 52 minutes) for mission-critical services. Make sure your architecture can actually deliver the number before you commit to it.
Set up external uptime monitoring with services like Pingdom, UptimeRobot, Better Uptime, or Datadog Synthetics. External monitoring catches issues that internal tools miss - network path problems, DNS failures, and similar. Monitor key user-facing endpoints, not just infrastructure health checks.
Publish uptime history on a public status page. This shows transparency and gives customers a historical record of your reliability. Include current service status, incident history, and uptime metrics.
Design for high availability at the infrastructure level. Deploy across multiple availability zones. Use load balancers to distribute traffic and handle server failures. Implement auto-scaling for traffic spikes. Use managed database services with automatic failover. Set up health checks that remove unhealthy instances automatically.
Build reliability into the application itself. Implement graceful degradation so core functionality keeps working when non-critical components fail. Add circuit breakers to prevent cascade failures. Use retry logic with exponential backoff for transient errors. Consider chaos engineering to proactively find reliability weaknesses.
Track and report SLA compliance regularly. Calculate uptime monthly and annually. Investigate any month that falls below target. Develop and share action plans to prevent recurrence.
Adopt error budgets. If your SLO is 99.9%, your error budget is 0.1% downtime per month. When the budget is healthy, prioritise features. When it is depleted, prioritise reliability. This gives you a data-driven way to balance the two.
Evidence Your Auditor Will Request
- Historical uptime metrics for the past 12 months
- SLA or SLO definitions and compliance records
- External uptime monitoring configuration
- Public status page with incident and uptime history
- High availability architecture documentation
Common Mistakes
- No external uptime monitoring; uptime claims are not independently verified
- SLA targets set without understanding if the architecture can deliver them
- Frequent outages with the same root causes recurring
- No public status page; customers have no visibility into service health
- Single points of failure in the architecture that could cause complete outages
Related Controls Across Frameworks
| Framework | Control ID | Relationship |
|---|---|---|
| SOC 2 | A1.1 | Related |
Frequently Asked Questions
What uptime is expected for a SaaS product?
Should we include scheduled maintenance in uptime calculations?
Track Tech Due Diligence compliance in one place
AuditFront helps you manage every Tech Due Diligence control, collect evidence, and stay audit-ready.
Start Free Assessment