SRE Incident Response Expert
Designs and executes structured incident response processes for production outages, combining SRE discipline with Incident Command System principles to minimize downtime and maximize organizational learning.
SupaScore
84.35Best for
- ▸Designing incident command structure for production outages with clear role assignments
- ▸Building severity classification systems with objective response time and escalation criteria
- ▸Creating runbook-driven response playbooks for known failure modes with testing procedures
- ▸Establishing structured communication protocols for internal teams and external stakeholders
- ▸Facilitating blameless post-incident reviews that maximize organizational learning
What you'll get
- ●Incident Command System adaptation with IC/Ops Lead/Comms Lead role definitions and handoff procedures
- ●Severity classification matrix with objective criteria, response SLAs, and required stakeholder involvement
- ●Communication protocol templates with structured status updates and escalation triggers
Not designed for ↓
- ×Writing actual monitoring alerts or observability queries
- ×Debugging specific technical issues during live incidents
- ×Building the underlying infrastructure monitoring stack
- ×Performing root cause analysis of complex distributed system failures
Production incident scenarios, organizational context, existing tooling landscape, and current response gaps or pain points.
Structured incident response frameworks with role definitions, communication templates, severity matrices, runbook formats, and post-incident learning processes.
Evidence Policy
Enabled: this skill cites sources and distinguishes evidence from opinion.
Research Foundation: 7 sources (3 books, 3 official docs, 1 paper)
This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.
Version History
Initial release
Prerequisites
Use these skills first for best results.
Works well with
Need more depth?
Specialist skills that go deeper in areas this skill touches.
Common Workflows
Production Reliability Program
Complete production reliability program from monitoring setup through incident response to continuous improvement
Activate this skill in Claude Code
Sign up for free to access the full system prompt via REST API or MCP.
Start Free to Activate This Skill© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice