OPS-4 Tech Due Diligence

Tech Due Diligence OPS-4: Uptime and SLA Management

What This Control Requires

The assessor evaluates the historical uptime track record, SLA commitments and compliance, uptime monitoring mechanisms, and the engineering practices that contribute to service reliability.

In Plain Language

For B2B SaaS, uptime is not just a technical metric - it is a commercial promise. Customers expect at least 99.9% availability (roughly 8.7 hours of downtime per year), and enterprise customers often demand 99.95% or higher. Assessors want to know whether your infrastructure and practices can sustainably deliver the reliability your market expects.

They will review historical uptime data (monthly and annual), your SLA or SLO targets and whether you are actually hitting them, the number and severity of production incidents over the past 12 months, the biggest outages and their root causes, whether uptime is verified by independent external monitoring, and architectural features supporting high availability like redundancy, failover, and load balancing.

Reliability is a direct measure of operational quality. A solid uptime track record backed by genuine engineering practices gives investors confidence. A history of repeated outages with the same root causes does the opposite.

How to Implement

Define and publish SLA or SLO targets that fit your market. Common SaaS benchmarks: 99.9% (roughly 8.7 hours downtime per year) for standard tiers, 99.95% (roughly 4.4 hours) for enterprise tiers, 99.99% (roughly 52 minutes) for mission-critical services. Make sure your architecture can actually deliver the number before you commit to it.

Set up external uptime monitoring with services like Pingdom, UptimeRobot, Better Uptime, or Datadog Synthetics. External monitoring catches issues that internal tools miss - network path problems, DNS failures, and similar. Monitor key user-facing endpoints, not just infrastructure health checks.

Publish uptime history on a public status page. This shows transparency and gives customers a historical record of your reliability. Include current service status, incident history, and uptime metrics.

Design for high availability at the infrastructure level. Deploy across multiple availability zones. Use load balancers to distribute traffic and handle server failures. Implement auto-scaling for traffic spikes. Use managed database services with automatic failover. Set up health checks that remove unhealthy instances automatically.

Build reliability into the application itself. Implement graceful degradation so core functionality keeps working when non-critical components fail. Add circuit breakers to prevent cascade failures. Use retry logic with exponential backoff for transient errors. Consider chaos engineering to proactively find reliability weaknesses.

Track and report SLA compliance regularly. Calculate uptime monthly and annually. Investigate any month that falls below target. Develop and share action plans to prevent recurrence.

Adopt error budgets. If your SLO is 99.9%, your error budget is 0.1% downtime per month. When the budget is healthy, prioritise features. When it is depleted, prioritise reliability. This gives you a data-driven way to balance the two.

Evidence Your Auditor Will Request

Historical uptime metrics for the past 12 months
SLA or SLO definitions and compliance records
External uptime monitoring configuration
Public status page with incident and uptime history
High availability architecture documentation

Common Mistakes

No external uptime monitoring; uptime claims are not independently verified
SLA targets set without understanding if the architecture can deliver them
Frequent outages with the same root causes recurring
No public status page; customers have no visibility into service health
Single points of failure in the architecture that could cause complete outages

Related Controls Across Frameworks

Framework	Control ID	Relationship
SOC 2	A1.1	Related

Frequently Asked Questions

What uptime is expected for a SaaS product?

99.9% is the standard expectation for B2B SaaS. Enterprise customers may push for 99.95% or 99.99%. Assessors judge whether the claimed uptime is credible based on architecture, operational practices, and incident history. Claiming 99.99% without proper redundancy and failover simply is not believable.

Should we include scheduled maintenance in uptime calculations?

Practice varies, but transparent companies exclude scheduled maintenance from SLA calculations and report it separately. For a modern SaaS product, you should be aiming for zero-downtime deployments anyway, which makes scheduled maintenance windows unnecessary for routine updates.

Track Tech Due Diligence compliance in one place

AuditFront helps you manage every Tech Due Diligence control, collect evidence, and stay audit-ready.

Start Free Assessment