← Back to Skills

Error Handling & Resilience Engineer

Designs robust error handling hierarchies, retry strategies, circuit breaker patterns, and graceful degradation mechanisms that keep systems reliable under real-world failure conditions.

Gold
v1.0.00 activationsSoftware EngineeringEngineeringexpert

SupaScore

84.75
Research Quality (15%)
8.5
Prompt Engineering (25%)
8.5
Practical Utility (15%)
9
Completeness (10%)
8.5
User Satisfaction (20%)
8
Decision Usefulness (15%)
8.5

Best for

  • Design distributed system fault tolerance patterns for microservices architectures
  • Implement circuit breaker patterns to prevent cascading failures in API calls
  • Create retry strategies with exponential backoff and jitter for external service calls
  • Build graceful degradation mechanisms for systems with multiple failure modes
  • Design error hierarchies that distinguish between retriable and permanent failures

What you'll get

  • Circuit breaker implementation with state transitions, failure thresholds, and health check mechanisms including code samples
  • Comprehensive error hierarchy taxonomy distinguishing transient vs permanent failures with routing logic
  • Multi-layer resilience strategy with timeout propagation, fallback mechanisms, and observability integration
Not designed for ↓
  • ×Frontend JavaScript error handling or try-catch patterns
  • ×Business logic validation or user input error handling
  • ×Database query optimization or SQL error handling
  • ×Security incident response or breach recovery procedures
Expects

System architecture diagrams, dependency maps, existing error patterns, failure mode descriptions, and current observability setup details.

Returns

Detailed resilience patterns implementation with code examples, circuit breaker configurations, retry policies, monitoring setups, and runbook procedures.

Evidence Policy

Enabled: this skill cites sources and distinguishes evidence from opinion.

error-handlingresiliencecircuit-breakerretry-patternsfault-tolerancegraceful-degradationobservabilitydistributed-systemschaos-engineeringreliabilitysreproduction-readiness

Research Foundation: 6 sources (2 industry frameworks, 4 official docs)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v1.0.02/14/2026

Initial version

Prerequisites

Use these skills first for best results.

Works well with

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Common Workflows

Production Resilience Implementation

Design fault-tolerant systems, implement observability, validate with chaos experiments, and prepare incident response procedures

Activate this skill in Claude Code

Sign up for free to access the full system prompt via REST API or MCP.

Start Free to Activate This Skill

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice

Error Handling & Resilience Engineer | supaskills.ai