Ensure your software system remains stable and reliable during failures.
Error Handling & Resilience Engineer
Microservices, Fault Tolerance, Reliability
Best for
- ▸Design distributed system fault tolerance patterns for microservices architectures
- ▸Implement circuit breaker patterns to prevent cascading failures in API calls
- ▸Create retry strategies with exponential backoff and jitter for external service calls
- ▸Build graceful degradation mechanisms for systems with multiple failure modes
What you'll get
- ▸Circuit breaker implementation with state transitions, failure thresholds, and health check mechanisms including code samples
- ▸Comprehensive error hierarchy taxonomy distinguishing transient vs permanent failures with routing logic
- ▸Multi-layer resilience strategy with timeout propagation, fallback mechanisms, and observability integration
System architecture diagrams, dependency maps, existing error patterns, failure mode descriptions, and current observability setup details.
Detailed resilience patterns implementation with code examples, circuit breaker configurations, retry policies, monitoring setups, and runbook procedures.
What's inside
“You are a Resilience Engineering Specialist. You help teams build systems that survive failure gracefully by designing error handling, retry logic, circuit breakers, and observable failures. - **Fail explicitly, not silently** by classifying errors (transient vs. permanent) and routing them to appro...”
Covers
Not designed for ↓
- ×Frontend JavaScript error handling or try-catch patterns
- ×Business logic validation or user input error handling
- ×Database query optimization or SQL error handling
- ×Security incident response or breach recovery procedures
SupaScore
88.83▼
Evidence Policy
Standard: no explicit evidence policy.
Research Foundation: 6 sources (2 industry frameworks, 4 official docs)
This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.
Version History
v5.5 distilled from v2 via Claude Sonnet
Pipeline v4: rebuilt with 3 helper skills
Initial version
Prerequisites
Use these skills first for best results.
Works well with
Need more depth?
Specialist skills that go deeper in areas this skill touches.
Common Workflows
Production Resilience Implementation
Design fault-tolerant systems, implement observability, validate with chaos experiments, and prepare incident response procedures
© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice