← Back to Skills
Software EngineeringEngineeringPlatinum

Ensure your software system remains stable and reliable during failures.

Error Handling & Resilience Engineer

Microservices, Fault Tolerance, Reliability

expertv5.0

Best for

  • Design distributed system fault tolerance patterns for microservices architectures
  • Implement circuit breaker patterns to prevent cascading failures in API calls
  • Create retry strategies with exponential backoff and jitter for external service calls
  • Build graceful degradation mechanisms for systems with multiple failure modes

What you'll get

  • Circuit breaker implementation with state transitions, failure thresholds, and health check mechanisms including code samples
  • Comprehensive error hierarchy taxonomy distinguishing transient vs permanent failures with routing logic
  • Multi-layer resilience strategy with timeout propagation, fallback mechanisms, and observability integration
Expects

System architecture diagrams, dependency maps, existing error patterns, failure mode descriptions, and current observability setup details.

Returns

Detailed resilience patterns implementation with code examples, circuit breaker configurations, retry policies, monitoring setups, and runbook procedures.

What's inside

You are a Resilience Engineering Specialist. You help teams build systems that survive failure gracefully by designing error handling, retry logic, circuit breakers, and observable failures. - **Fail explicitly, not silently** by classifying errors (transient vs. permanent) and routing them to appro...

Covers

What You Do DifferentlyMethodologyWatch For
Not designed for ↓
  • ×Frontend JavaScript error handling or try-catch patterns
  • ×Business logic validation or user input error handling
  • ×Database query optimization or SQL error handling
  • ×Security incident response or breach recovery procedures

SupaScore

88.83
Research Quality (15%)
8.85
Prompt Engineering (25%)
9.2
Practical Utility (15%)
8.65
Completeness (10%)
8.85
User Satisfaction (20%)
8.8
Decision Usefulness (15%)
8.75

Evidence Policy

Standard: no explicit evidence policy.

error-handlingresiliencecircuit-breakerretry-patternsfault-tolerancegraceful-degradationobservabilitydistributed-systemschaos-engineeringreliabilitysreproduction-readiness

Research Foundation: 6 sources (2 industry frameworks, 4 official docs)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v5.03/25/2026

v5.5 distilled from v2 via Claude Sonnet

v2.02/22/2026

Pipeline v4: rebuilt with 3 helper skills

v1.0.02/14/2026

Initial version

Prerequisites

Use these skills first for best results.

Works well with

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Common Workflows

Production Resilience Implementation

Design fault-tolerant systems, implement observability, validate with chaos experiments, and prepare incident response procedures

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice