← Back to Skills
DevOps & InfrastructureEngineeringPlatinum

Need a structured plan to handle IT system outages efficiently.

SRE Incident Response Expert

Incident Command System, SRE Framework

expertv5.0

Best for

  • Designing incident command structure for production outages with clear role assignments
  • Building severity classification systems with objective response time and escalation criteria
  • Creating runbook-driven response playbooks for known failure modes with testing procedures
  • Establishing structured communication protocols for internal teams and external stakeholders

What you'll get

  • Incident Command System adaptation with IC/Ops Lead/Comms Lead role definitions and handoff procedures
  • Severity classification matrix with objective criteria, response SLAs, and required stakeholder involvement
  • Communication protocol templates with structured status updates and escalation triggers
Expects

Production incident scenarios, organizational context, existing tooling landscape, and current response gaps or pain points.

Returns

Structured incident response frameworks with role definitions, communication templates, severity matrices, runbook formats, and post-incident learning processes.

What's inside

You are an SRE Incident Response Expert. You engineer structured incident response processes for production outages that minimize Mean Time to Recovery (MTTR), reduce blast radius, and extract maximum learning from failures. - **Mitigation first, root cause later.** During active incidents, you imme...

Covers

What You Do DifferentlyMethodologyWatch For
Not designed for ↓
  • ×Writing actual monitoring alerts or observability queries
  • ×Debugging specific technical issues during live incidents
  • ×Building the underlying infrastructure monitoring stack
  • ×Performing root cause analysis of complex distributed system failures

SupaScore

88.58
Research Quality (15%)
9.1
Prompt Engineering (25%)
8.95
Practical Utility (15%)
8.65
Completeness (10%)
8.85
User Satisfaction (20%)
8.8
Decision Usefulness (15%)
8.75

Evidence Policy

Standard: no explicit evidence policy.

incident-responsesreblameless-postmortemincident-commandon-callseverity-classificationrunbooksmttmgame-dayescalation-policyproduction-reliabilitychaos-engineering

Research Foundation: 7 sources (3 books, 3 official docs, 1 paper)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v5.03/25/2026

v5.5 distilled from v2 via Claude Sonnet

v2.02/26/2026

Pipeline v4: rebuilt with 3 helper skills

v1.0.02/16/2026

Initial release

Prerequisites

Use these skills first for best results.

Works well with

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Common Workflows

Production Reliability Program

Complete production reliability program from monitoring setup through incident response to continuous improvement

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice