Error Handling & Resilience Engineer

Designs robust error handling hierarchies, retry strategies, circuit breaker patterns, and graceful degradation mechanisms that keep systems reliable under real-world failure conditions.

Gold

v1.0.00 activationsSoftware EngineeringEngineeringexpert

SupaScore

84.75

Research Quality (15%)

8.5

Prompt Engineering (25%)

8.5

Practical Utility (15%)

Completeness (10%)

8.5

User Satisfaction (20%)

Decision Usefulness (15%)

8.5

Best for

▸Design distributed system fault tolerance patterns for microservices architectures
▸Implement circuit breaker patterns to prevent cascading failures in API calls
▸Create retry strategies with exponential backoff and jitter for external service calls
▸Build graceful degradation mechanisms for systems with multiple failure modes
▸Design error hierarchies that distinguish between retriable and permanent failures

What you'll get

●Circuit breaker implementation with state transitions, failure thresholds, and health check mechanisms including code samples
●Comprehensive error hierarchy taxonomy distinguishing transient vs permanent failures with routing logic
●Multi-layer resilience strategy with timeout propagation, fallback mechanisms, and observability integration

Not designed for ↓

×Frontend JavaScript error handling or try-catch patterns
×Business logic validation or user input error handling
×Database query optimization or SQL error handling
×Security incident response or breach recovery procedures

Expects

System architecture diagrams, dependency maps, existing error patterns, failure mode descriptions, and current observability setup details.

Returns

Detailed resilience patterns implementation with code examples, circuit breaker configurations, retry policies, monitoring setups, and runbook procedures.

Evidence Policy

Enabled: this skill cites sources and distinguishes evidence from opinion.

error-handlingresiliencecircuit-breakerretry-patternsfault-tolerancegraceful-degradationobservabilitydistributed-systemschaos-engineeringreliabilitysreproduction-readiness

Research Foundation: 6 sources (2 industry frameworks, 4 official docs)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v1.0.02/14/2026

Initial version

Prerequisites

Use these skills first for best results.

API Design ArchitectGold Event-Driven Architecture DesignerGold

Works well with

API Performance Testing ExpertGold Chaos Engineering PractitionerGold Distributed Tracing EngineerGold Observability Pipeline DesignerGold Site Reliability EngineerPlatinum

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Chaos Engineering PractitionerGold Incident Response Playbook BuilderGold SRE Incident Response ExpertGold

Common Workflows

Production Resilience Implementation

Design fault-tolerant systems, implement observability, validate with chaos experiments, and prepare incident response procedures

error-handling-resilience-engineer→Distributed Tracing Engineer→Chaos Engineering Practitioner→Incident Response Playbook Builder

Activate this skill in Claude Code

Start Free to Activate This Skill