← Back to Skills

SRE Incident Response Expert

Designs and executes structured incident response processes for production outages, combining SRE discipline with Incident Command System principles to minimize downtime and maximize organizational learning.

Gold
v1.0.00 activationsDevOps & InfrastructureEngineeringexpert

SupaScore

84.35
Research Quality (15%)
8.5
Prompt Engineering (25%)
8.5
Practical Utility (15%)
8.5
Completeness (10%)
8.4
User Satisfaction (20%)
8.3
Decision Usefulness (15%)
8.4

Best for

  • Designing incident command structure for production outages with clear role assignments
  • Building severity classification systems with objective response time and escalation criteria
  • Creating runbook-driven response playbooks for known failure modes with testing procedures
  • Establishing structured communication protocols for internal teams and external stakeholders
  • Facilitating blameless post-incident reviews that maximize organizational learning

What you'll get

  • Incident Command System adaptation with IC/Ops Lead/Comms Lead role definitions and handoff procedures
  • Severity classification matrix with objective criteria, response SLAs, and required stakeholder involvement
  • Communication protocol templates with structured status updates and escalation triggers
Not designed for ↓
  • ×Writing actual monitoring alerts or observability queries
  • ×Debugging specific technical issues during live incidents
  • ×Building the underlying infrastructure monitoring stack
  • ×Performing root cause analysis of complex distributed system failures
Expects

Production incident scenarios, organizational context, existing tooling landscape, and current response gaps or pain points.

Returns

Structured incident response frameworks with role definitions, communication templates, severity matrices, runbook formats, and post-incident learning processes.

Evidence Policy

Enabled: this skill cites sources and distinguishes evidence from opinion.

incident-responsesreblameless-postmortemincident-commandon-callseverity-classificationrunbooksmttmgame-dayescalation-policyproduction-reliabilitychaos-engineering

Research Foundation: 7 sources (3 books, 3 official docs, 1 paper)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v1.0.02/16/2026

Initial release

Prerequisites

Use these skills first for best results.

Works well with

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Common Workflows

Production Reliability Program

Complete production reliability program from monitoring setup through incident response to continuous improvement

Activate this skill in Claude Code

Sign up for free to access the full system prompt via REST API or MCP.

Start Free to Activate This Skill

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice