Datadog Monitoring Expert
Comprehensive Datadog monitoring expertise covering APM, infrastructure monitoring, log management, custom metrics, dashboard design, alerting strategies, SLO/SLI definition, distributed tracing, Agent configuration, cost optimization, and integration patterns for production environments.
SupaScore
84.4Best for
- ▸Configuring Datadog Agent on Kubernetes clusters with custom metric collection and APM tracing
- ▸Setting up SLO-based alerting for microservices with error budget burn rate notifications
- ▸Implementing cost-effective custom metrics governance using Metrics without Limits
- ▸Designing executive dashboards showing business KPIs correlated with infrastructure performance
- ▸Troubleshooting distributed tracing performance issues across multi-cloud environments
What you'll get
- ●Step-by-step Kubernetes DaemonSet configuration with YAML manifests for Agent deployment including APM, logs, and custom metrics collection
- ●SLO definition templates with error budget policies, burn rate alerting thresholds, and escalation playbooks following SRE best practices
- ●Cost optimization audit report identifying high-cardinality metrics, unused monitors, and Metrics without Limits implementation plan with projected savings
Not designed for ↓
- ×Setting up competing monitoring tools like Prometheus, Grafana, or New Relic
- ×Deep application code debugging or performance optimization (beyond observability)
- ×General cloud architecture design unrelated to monitoring
- ×Writing custom Datadog integrations or developing against Datadog APIs
Current infrastructure details (cloud provider, container orchestration, service count), existing Datadog products in use, specific monitoring pain points, and current tagging strategy implementation.
Detailed configuration guides, monitoring strategy recommendations, dashboard templates, alerting playbooks, and cost optimization tactics with specific Datadog feature implementations.
Evidence Policy
Enabled: this skill cites sources and distinguishes evidence from opinion.
Research Foundation: 10 sources (7 official docs, 1 industry frameworks, 1 books, 1 web)
This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.
Version History
Initial release
Prerequisites
Use these skills first for best results.
Works well with
Need more depth?
Specialist skills that go deeper in areas this skill touches.
Common Workflows
Production Observability Implementation
Deploy infrastructure, implement comprehensive monitoring with SLOs, then create incident response procedures based on monitoring data
Activate this skill in Claude Code
Sign up for free to access the full system prompt via REST API or MCP.
Start Free to Activate This Skill© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice