Blog

What separates expert AI output from the generic kind. Performance data, integration guides, and industry perspectives.

All engineering research Industry Performance SEO Tutorial

FeaturedengineeringMay 18, 20264 min read

Banger Drought

Building in public is easy when you ship the screenshot-worthy stuff. Here's what the unglamorous in-between weeks look like — security tightening, data integrity, monitoring, infrastructure hardening. The work that makes the next round of features survivable.

transparencyinfrastructurebuilding-in-publicfoundationssmall-team

engineeringMar 27, 20268 min read

Why We Rewrote All 1,300 Skills (Twice)

A user tested our Code Review skill against a competitor. We lost. What followed was two rounds of rewrites, a per-skill health check, and a fundamental rethinking of what makes a skill useful.

v5qualityprompt-engineeringbenchmark

researchMar 24, 20268 min read

We Ran 819 API Calls to Find Claude's Signature Catchphrases

We built a simulator that fed 40 developer scenarios into Claude Sonnet across 7 languages. Then we asked Claude to analyze its own output. 332 catchphrases later, we know exactly which phrases Claude reaches for - and why it matters.

experimentclaudemotivationsycophancy

IndustryMar 23, 20265 min read

Stop Reading 'Top 5 Claude Code Skills' Articles

Every week, another listicle tells you the '10 skills you need.' They're all wrong. Here's why modern projects need hundreds of specialized skills, not five generic ones.

opinionskillsengineering

IndustryMar 23, 20265 min read

The Subagent Void: Why Your AI Sub-Agents Are Working Blind

You spawn sub-agents for parallel work. They start with zero expertise. Here's how to fix that with dynamic skill loading.

subagentsskillsmcparchitecture

PerformanceMar 22, 20266 min read

We Tested It: Does Loading the Same Skill (Prompt) Twice Make AI Better?

We ran a controlled experiment: no skill, one skill, the same skill loaded twice, and two similar skills combined. The results surprised us. Double-stacking improved quality by 8% - but at 2x the token cost.

experimentprompt-engineeringskillsbenchmark

PerformanceMar 21, 202612 min read

Introducing SkillStreaming: Dynamic Expertise Retrieval Across 1,000+ AI Skills

We decomposed 1,279 AI skills into 13,381 retrievable fragments and built a system that assembles cross-domain expertise on every turn. Same concept coverage, 63% fewer tokens, zero manual skill selection.

skillstreamingragretrievalsubskills

PerformanceMar 16, 20268 min read

The Ecosystem Audit: Scoring 167 Community Agent Skills

We scored 167 community-built Claude Code skills from 40+ organisations using the same SupaScore rubric we apply to our own. The tier distribution tells a clear story about what quality infrastructure adds.

benchmarkecosystemqualitycommunity

PerformanceMar 15, 202610 min read

What Deep Research Adds to Claude's Built-In Skills: A Data Comparison

We scored Anthropic's 21 Claude Code skills alongside our closest equivalents using the same rubric. The data shows where domain research and quality infrastructure make a measurable difference.

benchmarkanthropicqualityskills

PerformanceMar 15, 20268 min read

How Safety Skills Improve Claude's Responses in Sensitive Domains: A 68-Query Benchmark

We benchmarked Claude with and without safety skills on 68 real-world queries in sensitive domains. 6 scoring dimensions, 10 domains, 272 API calls. Skill-augmented responses scored 26.8% higher with a 96% win rate.

safetyevalbenchmarksociety

IndustryMar 13, 20264 min read

The Prompt Quality Problem Nobody Talks About

Everyone talks about model quality. Nobody talks about prompt quality. But the prompt determines 80% of output quality.

promptsqualitystandardsopinion

PerformanceMar 12, 20265 min read

How We Tune AI: From Generic to Expert in 6 Dimensions

The instrument is the same. But untuned, it sounds wrong. Here's what tuning AI actually means, and what it changes in your output.

qualitymethodologycomparison

IndustryMar 12, 202612 min read

We Rated 22 Viral Vibe Coding Tips: Here's What Actually Works

We analyzed 22 widely-shared AI coding tips from Boris Cherny, HumanLayer, Addy Osmani, and others. Scored each on measurability, security, context-cost, and portability. The results might surprise you.

vibe-codingclaude-codeai-codingbest-practices

SEOMar 11, 20267 min read

Best Claude Skills for Legal and Compliance (2026)

The top-scored legal and compliance skills. Contract review, GDPR, employment law, audit. The expert in the room you can't afford to hire.

legalcompliancegdprcontracts

PerformanceMar 10, 20268 min read

We Rebuilt All 1,078 Skills. Here's What 143 Hours of AI Told Us.

After our 10-skill pilot proved the framework, we ran the full pipeline. 1,070 skills rebuilt, average score up 3.9 points, 97% now Platinum. The results changed how we think about AI quality at scale.

pipelinequalitybenchmarkv2

SEOMar 6, 20267 min read

Best Claude Skills for Marketing Teams (2026)

The 12 highest-scored marketing and business skills. Strategy, content, analytics, growth, grouped by what you actually need.

marketingcontentanalyticsgrowth

SEOMar 5, 20267 min read

Best Claude Skills for Software Engineers (2026)

The 12 highest-scored engineering skills on SupaSkills. Grouped by use case: code review, architecture, DevOps, testing, security.

engineeringcode-reviewdevopsarchitecture

IndustryMar 4, 20266 min read

What Happens When AI Skills Go Rogue

Unvetted system prompts can contain data exfiltration instructions, prompt injection, and credential harvesting. The ecosystem needs standards.

securitysafetyquality

TutorialFeb 27, 20263 min read

MCP in 30 Seconds: Expert Skills in Claude Code

Copy-paste the config. Load a skill. Ask a domain question. You're live in under 60 seconds.

mcpclaude-codesetuptutorial

PerformanceFeb 26, 20265 min read

The Hidden Cost of Bad AI Advice

Bad AI advice isn't free. It costs decisions. A wrong LTV:CAC calculation. A missed compliance deadline. A contract clause nobody flagged.

businessfinancerisk

IndustryFeb 25, 20264 min read

System Prompts Are the New Codebase

You version your code. You test your code. You review your code. Your system prompts get none of that. Here's why that's a problem.

promptsengineeringopinion

PerformanceFeb 24, 20267 min read

We Rebuilt 10 Skills with 4 AI Models. The Model Mattered Less Than We Expected.

We tested Gemini 3.1 Pro, Claude Opus 4.6, and a tag-team approach against our current pipeline. The framework gave 5x more improvement than the model swap.

multi-modelpipelinequalitybenchmark

PerformanceFeb 23, 20268 min read

10 Questions Where Expert Skills Outperform Generic Prompts

We tested 10 hard questions across legal, finance, security, and engineering. Expert-guided prompts consistently outperformed generic prompts on the details that matter.

benchmarkcomparisonhard-nuts

PerformanceFeb 23, 20266 min read

How SupaScore Works: 6 Dimensions That Separate Good from Dangerous

What happens when you use an AI skill scored 62 versus one scored 87. The difference isn't academic. It's your next business decision.

supascorequalitymethodology

PerformanceFeb 23, 20265 min read

What Expert Skills Catch in Contracts That Generic AI Misses

A SaaS contract review where an expert legal skill caught three deal-breaking clauses that a generic prompt missed. Here's what happened.

legalcontractsbenchmark