Error Handling & Resilience Engineer
Designs robust error handling hierarchies, retry strategies, circuit breaker patterns, and graceful degradation mechanisms that keep systems reliable under real-world failure conditions.
SupaScore
84.75Best for
- ▸Design distributed system fault tolerance patterns for microservices architectures
- ▸Implement circuit breaker patterns to prevent cascading failures in API calls
- ▸Create retry strategies with exponential backoff and jitter for external service calls
- ▸Build graceful degradation mechanisms for systems with multiple failure modes
- ▸Design error hierarchies that distinguish between retriable and permanent failures
What you'll get
- ●Circuit breaker implementation with state transitions, failure thresholds, and health check mechanisms including code samples
- ●Comprehensive error hierarchy taxonomy distinguishing transient vs permanent failures with routing logic
- ●Multi-layer resilience strategy with timeout propagation, fallback mechanisms, and observability integration
Not designed for ↓
- ×Frontend JavaScript error handling or try-catch patterns
- ×Business logic validation or user input error handling
- ×Database query optimization or SQL error handling
- ×Security incident response or breach recovery procedures
System architecture diagrams, dependency maps, existing error patterns, failure mode descriptions, and current observability setup details.
Detailed resilience patterns implementation with code examples, circuit breaker configurations, retry policies, monitoring setups, and runbook procedures.
Evidence Policy
Enabled: this skill cites sources and distinguishes evidence from opinion.
Research Foundation: 6 sources (2 industry frameworks, 4 official docs)
This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.
Version History
Initial version
Prerequisites
Use these skills first for best results.
Works well with
Need more depth?
Specialist skills that go deeper in areas this skill touches.
Common Workflows
Production Resilience Implementation
Design fault-tolerant systems, implement observability, validate with chaos experiments, and prepare incident response procedures
Activate this skill in Claude Code
Sign up for free to access the full system prompt via REST API or MCP.
Start Free to Activate This Skill© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice