What is a Service Level Objective (SLO)?
Published: Monday, 02 December 2024
A Service Level Objective (SLO) is an internal, measurable target for a service’s performance, availability, or quality. It represents the engineering team’s commitment to how well the service should perform for users or customers.
SLOs are typically defined with metrics such as Uptime (e.g., 99.9%), latency (e.g., 95% of requests finish in under 300 ms), or throughput.
Why SLOs Matter as Much as SLAs
- Foundation for SLAs: SLOs are usually set slightly more stringent than the customer-facing SLA, creating a safety buffer so contractual commitments are met.
- Drives Alerting: SLOs provide the context for critical alerts. Notifications should fire when the SLO is close to breach, helping combat alert fatigue.
- Enables the Error Budget: SLOs define the Error Budget, the allowable downtime or failures over a period. When the error budget is depleted, you know you need to slow feature work and focus on reliability.
Common Challenges
- Overly Aggressive Targets: Setting numbers that are technologically or financially unrealistic creates constant stress and burnout.
- Measurement Misalignment: Measuring SLOs with infrastructure metrics (e.g., CPU load) only instead of user-centric signals (e.g., checkout success rate) gives a false sense of reliability.
- Treating SLOs Like SLAs: Using them as contractual penalties rather than as operational signals for internal improvement.
How to Set the Right SLO
- Focus on User Journeys: Base SLOs on the most critical interactions (login API latency, purchase success rate) instead of low-level component health.
- Define the SLI First: Identify the Service Level Indicator (SLI), your trackable metric, before locking the objective.
- Use the Error Budget to Prioritize: When the budget is healthy, ship features; when it is nearly spent, pivot to reliability and bug fixes to stay within the SLO.
Recommended glossary terms
Read all glossary items and learn about what's happening at All Quiet.