Methodology

How We Score AI Skills

Every skill in the supaskills catalogue is scored on 6 dimensions using SupaScore. The rubric is public. The minimum to publish is 80/100. Here's how it works.

Why we score

Not all system prompts are equal. A one-line “you are an expert” prompt and a 3,000-token researched methodology both technically work — but they produce fundamentally different output.

We built SupaScore to measure that difference. Every skill in our catalogue passes through the same quality evaluation before it goes live. If it scores below 80, it doesn't ship.

The scoring system isn't marketing. It's our quality gate.

The 6 dimensions

Each skill is evaluated on six dimensions. The composite SupaScore is a weighted average.

Research Quality

15 %

Does the skill draw on verified, high-quality sources? Are frameworks correctly applied? Is the domain knowledge accurate and current?

•5+ cited sources required
•Minimum 2 source types (e.g. book + official docs)
•No single-source dependency
•Factual accuracy verified

Prompt Engineering

25 %

Is the system prompt well-structured? Does it use clear instructions, structured output formats, and effective techniques?

•Clear role definition and task framing
•Structured output format specified
•Edge cases and constraints addressed
•Efficient token usage (no bloat)

Practical Utility

15 %

Does the skill produce output that is directly useful? Can the user act on it without significant rework?

•Output is actionable, not just informational
•Format matches real-world use (report, checklist, analysis)
•Reduces time vs. manual approach
•Works across typical use cases in the domain

Completeness

10 %

Does the skill cover the domain adequately? Are there obvious gaps or missing perspectives?

•Core aspects of the domain addressed
•Common edge cases handled
•Guardrails for out-of-scope requests
•Appropriate depth (not superficial, not overloaded)

User Satisfaction

20 %

Does the output feel right? Is it clear, well-organised, and professional?

•Output is readable and well-structured
•Tone matches the domain (formal for legal, practical for engineering)
•No hallucination-prone instructions
•Consistent quality across different inputs

Decision Usefulness

15 %

Does the skill help the user make better decisions? Does it surface options, risks, and trade-offs?

•Presents alternatives, not just one answer
•Identifies risks and limitations
•Adapts to user’s specific context
•Supports informed decision-making

The score scale

Production floor: 80 (Gold tier). Nothing below 80 enters the catalogue. Current range: 80.00 – 89.75. Average: 84.2.

Diamond

95 – 100

Expert-verified. Available to Max users.

Platinum

85 – 94

Excellent. Available to Pro and Max users.

Gold

70 – 84

Published. Available to all users.

Silver

60 – 69

Below quality gate. Not published.

Bronze

< 60

Draft only. Internal use.

How skills are built

Each skill goes through an 8-phase research pipeline.

1
Domain Scoping
Define the skill’s domain, target user, and expected output
2
Source Research
Collect 6+ sources (books, papers, frameworks, official docs)
3
Methodology Extraction
Identify key frameworks, diagnostic questions, decision trees
4
Prompt Drafting
Write the system prompt encoding the methodology
5
Quality Scoring
Score on 6 dimensions using SupaScore
6
Masterfile Creation
Create the canonical reference document (research + prompt + sources + score)
7
Quality Gate
Automated check: score ≥ 80, sources ≥ 6, masterfile complete
8
Publication
Skill goes live in the catalogue with version tracking

The pipeline has run 35+ production sessions with 0 failures.

Source standards

Every skill cites its research sources. We require minimum 6 sources per skill and minimum 2 source types (prevents single-perspective bias).

Sources are displayed as “Research Sources” — we conducted the research, the skill is our original work.

Source types in the catalogue

Official Documentation1,837

Books1,206

Industry Frameworks959

Academic Papers368

Web References312

Research Papers203

Community Practice54

Expert Knowledge52

Total4,993

What we don't claim

—We don't claim every skill is perfect. The scoring system exists because quality varies.
—We don't claim the score predicts your specific use case. Try the free tier and judge for yourself.
—We don't claim affiliation with or endorsement by any cited author or organisation.
—We don't claim system prompts replace domain expertise. They encode it for faster, more consistent access.
—Our benchmark is a demonstration (5 tests), not a peer-reviewed study. We report patterns and specific examples, not aggregate percentages.

SupaBoost results

We tested 5 Platinum-tier skills head-to-head against vanilla Claude Sonnet. Same prompt, same model, with and without a skill loaded.

The pattern was consistent across all 5 domains: vanilla Claude gives correct but general advice. A skill transforms it into a structured methodology with specific frameworks, templates, code, and monitoring.

This is a demonstration (5 tests, not a peer-reviewed study). Full case studies with before/after comparisons are available on the benchmark page.

View benchmark results →

Questions about the methodology? Get in touch

Browse Scored Skills

SupaSkills is built by Kill The Dragon, a strategy agency in Vienna.