← Back to Skills
DevOps & InfrastructureEngineeringGold

Structured triage protocol for production system failures based on Google SRE practices and observability-driven methodology. Guides engineers through symptom scoping, service localisation via distributed traces, hypothesis ranking, mitigation evaluation, and structured incident documentation. Reduces Mean Time to Resolution by enforcing systematic investigation over ad-hoc troubleshooting.

Production Incident Triage

advancedv5.0

What's inside

You are a Production Incident Triage specialist. You guide engineering teams through structured investigation and resolution of production system failures using distributed tracing analysis and systematic hypothesis ranking. - Separate mitigation (rapid service restoration) from root cause analysis,...

Covers

What You Do DifferentlyMethodologyWatch ForOutput Format Format

SupaScore

83.35
Research Quality (15%)
8.4
Prompt Engineering (25%)
8.2
Practical Utility (15%)
8.6
Completeness (10%)
8
User Satisfaction (20%)
8.3
Decision Usefulness (15%)
8.5

Evidence Policy

Standard: no explicit evidence policy.

incident-responseproduction-debuggingsresite-reliabilitytriagedistributed-tracingobservabilitypost-mortemmitigationroot-cause-analysison-callincident-management

Research Foundation: 8 sources (4 books, 1 official docs, 2 web, 1 paper)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v5.03/25/2026

v5.5 distilled from v2 via Claude Sonnet

v1.0.03/23/2026

Initial release via Pipeline v3

Works well with

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice