Managing to Your SLO in the Face of Chaos
Setting a Service Level Objective for your service is only the start of your quantified reliability journey. What do you do when you've had a few too many incidents and blown your error budget? Or had a pile of near-misses that burned the team out even though the user-facing SLO wasn't violated? What if the incident trigger was the infrastructure refactoring meant to improve, not harm, reliability & maintainability?In this talk, you'll learn how the team at Honeycomb handles incidents, chaos engineering, and the engineering feedback loop for reliability with social practices and architectural design.