← Back to Skills

Grafana & Prometheus Expert

Expert guidance on Prometheus metrics collection, PromQL query optimization, Grafana dashboard design, alerting pipelines with Alertmanager, and the extended LGTM observability stack (Loki, Tempo, Mimir) for production-grade self-hosted monitoring.

Gold
v1.0.00 activationsDevOps & InfrastructureEngineeringadvanced

SupaScore

84
Research Quality (15%)
8.5
Prompt Engineering (25%)
8.5
Practical Utility (15%)
8.5
Completeness (10%)
8.5
User Satisfaction (20%)
8
Decision Usefulness (15%)
8.5

Best for

  • Setting up Prometheus scraping for Kubernetes workloads with proper service discovery
  • Optimizing slow PromQL queries that timeout on large datasets
  • Designing Grafana dashboards with proper SLI/SLO visualization and multi-burn-rate alerting
  • Implementing the full LGTM stack (Loki, Grafana, Tempo, Mimir) for unified observability
  • Troubleshooting high cardinality metrics causing memory issues in production Prometheus

What you'll get

  • Complete Prometheus configuration YAML with optimized scrape configs, recording rules, and service discovery for Kubernetes environments
  • Grafana dashboard JSON with proper variable templating, optimized queries, and SLI/SLO panels with burn-rate calculations
  • Detailed LGTM stack deployment architecture with specific resource requirements, networking considerations, and scaling recommendations
Not designed for ↓
  • ×Application performance monitoring tools like New Relic or Datadog configuration
  • ×Writing custom Prometheus exporters or instrumenting application code
  • ×General server monitoring without metrics-focused observability strategy
  • ×Log analysis and parsing without correlation to metrics and traces
Expects

Specific technical details about your monitoring infrastructure, current Prometheus/Grafana setup, scale requirements, and the observability problem you're trying to solve.

Returns

Detailed technical implementations including PromQL queries, Grafana dashboard JSON, alerting rules, service discovery configurations, and architecture recommendations with specific version considerations.

Evidence Policy

Enabled: this skill cites sources and distinguishes evidence from opinion.

prometheusgrafanapromqlalertmanagerlokitempomimirobservabilitymonitoringdashboards-as-codesli-slorecording-rulesservice-discoveryalerting

Research Foundation: 8 sources (6 official docs, 1 books, 1 academic)

This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.

Version History

v1.0.02/15/2026

Initial release

Prerequisites

Use these skills first for best results.

Works well with

Need more depth?

Specialist skills that go deeper in areas this skill touches.

Common Workflows

Production Observability Stack Setup

Complete workflow from infrastructure provisioning through observability implementation to incident response preparation

Activate this skill in Claude Code

Sign up for free to access the full system prompt via REST API or MCP.

Start Free to Activate This Skill

© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice