← Back to blog
Performancebenchmarkcomparisonhard-nuts

10 Questions Where Generic AI Fails (And Expert AI Doesn't)

Max Jürschik·February 23, 2026·8 min read

Generic AI has a specific failure mode. It does not say "I don't know." It does not flag uncertainty. It gives you an answer that reads like expertise, formatted like expertise, and is wrong in ways that matter.

We tested 10 hard questions — the kind professionals face in their actual work — across legal, finance, security, engineering, and more. For each one, we ran the question through vanilla Claude and through a domain-specific expert skill. Same model. Same question. Different instructions.

The pattern was consistent: generic AI gets the shape of the answer right but misses the substance. Here are all ten.

1. Legal: Non-Compete Enforceability

The question: "Is this non-compete clause enforceable for a software engineer in California who is moving to a competitor in Germany?"

Vanilla Claude said: The clause may be enforceable depending on jurisdiction. California generally disfavors non-competes, but enforcement can vary. Consult a lawyer for jurisdiction-specific advice.

Expert skill said: This non-compete is unenforceable for the California employment period under California Business and Professions Code Section 16600, which voids non-competes with narrow exceptions (sale of business). For the German employment period, enforceability depends on whether the clause includes a Karenzentschaedigung (mandatory compensation of at least 50% of last salary during the restriction period) per HGB Section 74. Without this compensation commitment, German courts will void the clause. Additionally, the 24-month duration exceeds the German maximum of 2 years only if the compensation is paid. Cross-border enforcement between US and EU jurisdictions has no treaty basis — the California employer cannot enforce a German court judgment and vice versa.

Why it matters: The vanilla answer tells you to ask a lawyer. The expert answer tells you what the lawyer will say, with the specific statutes, so you can negotiate from an informed position before spending EUR 400/hour on outside counsel.

2. Finance: SaaS Unit Economics

The question: "Our LTV/CAC ratio is 2.8. Is that good?"

Vanilla Claude said: An LTV/CAC ratio of 2.8 is generally considered healthy. The typical benchmark is 3:1 or higher, so you are slightly below the ideal target. Focus on improving customer lifetime value or reducing acquisition costs.

Expert skill said: A 2.8 LTV/CAC requires context. Key questions: What is your payback period? A 2.8 ratio with 6-month payback is strong — it means fast capital recycling. A 2.8 with 24-month payback may signal a cash flow problem even if the ratio looks acceptable. What is your blended vs. channel-specific ratio? If organic is 8.0 and paid is 1.4, the blended 2.8 masks that half your spend is destroying value. How are you calculating LTV? If you are using average revenue times average lifetime without segmenting by cohort, you are likely overstating LTV by 15-30% due to survivorship bias. At your ARR range (sub-EUR 5M based on the question context), investors will care more about payback period (target: under 12 months) and gross margin (target: above 70%) than LTV/CAC.

Why it matters: The benchmark answer (3:1 is ideal) is the kind of knowledge you pick up from any blog post. The expert answer identifies the three ways the headline number can mislead you.

3. Security: SOC 2 Readiness

The question: "We need SOC 2 Type II certification. What do we need to do?"

Vanilla Claude said: SOC 2 Type II requires implementing controls across five trust service criteria: Security, Availability, Processing Integrity, Confidentiality, and Privacy. You will need to engage an auditor, implement an information security policy, and maintain documentation of your controls over a review period of at least 6 months.

Expert skill said: Start with scope selection. Most startups pursue Security only for their first SOC 2 — adding all five trust service criteria increases audit cost by 40-60% and doubles the control count. For a SaaS company, focus on Security (mandatory) and Availability (expected by enterprise buyers). Processing Integrity, Confidentiality, and Privacy can wait for Year 2. Concrete steps: (1) Deploy an endpoint detection and response tool (CrowdStrike, SentinelOne) — auditors check for it by name. (2) Implement centralized logging with 90-day retention minimum. (3) Enable SSO and MFA for all production systems — shared credentials are an automatic finding. (4) Background checks for employees with production access. (5) Quarterly access reviews with documented evidence. (6) Vulnerability scanning (weekly) and penetration testing (annual). (7) Incident response plan with tabletop exercise. The observation period is 3-12 months. Target 6 months. Budget EUR 25-50K for the audit itself, EUR 10-20K for a readiness platform (Vanta, Drata, Secureframe). Timeline from zero to certified: 9-12 months if you start now.

Why it matters: The generic answer describes the certification. The expert answer gives you the implementation plan, budget, and timeline. One is a Wikipedia article. The other is a project kickoff.

4. Data Analysis: Cohort Interpretation

The question: "Our Month 3 retention dropped from 68% to 61% in the last two cohorts. Should we be worried?"

Vanilla Claude said: A drop from 68% to 61% in Month 3 retention is notable and worth investigating. This could indicate product-market fit issues, onboarding problems, or seasonal factors. Consider analyzing the cohorts more closely to identify potential causes.

Expert skill said: A 7-percentage-point drop in M3 retention across two consecutive cohorts is a significant signal. Context checks: (1) Did acquisition channels shift? If you increased paid spend, you may be acquiring lower-intent users who churn faster — check M3 retention by acquisition source. (2) Did you ship product changes in the activation window? Changes to onboarding or core flows in the 0-90 day window directly impact M3. (3) Check the cohort sizes. If the recent cohorts are 2x larger from a campaign, the retention drop may reflect audience quality, not product quality. Action: Segment both cohorts by activation milestone completion. If activated users retain at 68% in both cohorts but the activation rate dropped, your problem is onboarding, not retention. If activated users also retain worse, you have a core product issue. Do not wait for Month 6 data — the pattern is already actionable.

Why it matters: "Investigate further" is not analysis. Telling you exactly what to segment, what to compare, and what the answer means — that is analysis.

5. Compliance: GDPR Data Subject Request

The question: "A user in France submitted a data deletion request. We use 4 third-party processors. What's the process?"

Vanilla Claude said: Under GDPR Article 17, you must delete the user's personal data without undue delay, typically within 30 days. You should also inform your third-party processors to delete the data. Document the process for compliance purposes.

Expert skill said: The timeline is 30 days from receipt of a verifiable request (Article 12(3)), extendable by 60 days for complex cases with notification to the data subject. Process: (1) Verify identity — you need reasonable confirmation this is the actual data subject, not someone social engineering deletion. (2) Check for exemptions — Article 17(3) allows retention for legal obligation compliance, public interest, or legal claims defense. Financial transaction records may have retention obligations under French commercial law (Code de Commerce, 10 years for accounting records). (3) For your 4 processors: you must issue deletion instructions under your DPA terms. Under Article 17(2), you must also notify any controllers you shared the data with. Track processor confirmations — if a processor cannot delete (e.g., backup retention cycles), document the timeline for complete deletion. (4) French specificity: CNIL expects you to delete from backups within the backup rotation cycle, not immediately, but you must document the expected deletion date. (5) Respond to the data subject confirming deletion, listing categories deleted, and noting any retained data with legal basis for retention. Keep the response record for 3 years (CNIL recommendation).

Why it matters: Missing the identity verification step exposes you to social engineering attacks. Missing the retention exemption means you might delete legally required financial records. Missing the CNIL-specific backup guidance means you might over-commit on timelines you cannot meet.

6. Architecture: Microservices Migration

The question: "We have a Django monolith with 200K LOC. Should we migrate to microservices?"

Vanilla Claude said: Microservices offer benefits like independent scaling, technology flexibility, and team autonomy. However, they add complexity in deployment, monitoring, and data management. For a 200K LOC application, consider starting with a modular monolith before full microservices migration.

Expert skill said: At 200K LOC with a single team (likely, given you are asking this question rather than having a platform team make the call), the answer is almost certainly no. Microservices solve organizational scaling problems — multiple teams needing independent deploy cycles. They do not solve code quality problems, and they turn a function call into a network call with failure modes you currently do not have. Instead: (1) Identify the 2-3 bounded contexts in your Django app that have independent data models and different scaling profiles (e.g., a notification system that spikes, a reporting module that runs heavy queries). (2) Extract those as separate Django apps within the same repo, communicating via well-defined interfaces. (3) If one module genuinely needs independent scaling, extract that single piece as a service. Keep the rest monolithic. The "strangler fig" pattern works here. Cost of premature microservices at your scale: 3-6x increase in infrastructure complexity, need for service mesh or API gateway, distributed tracing tooling, and a deployment pipeline that currently does not exist. You will spend more time on infrastructure than product.

Why it matters: The generic answer weighs pros and cons. The expert answer gives you a specific decision (no) with a specific alternative (modular monolith, extract by scaling profile) and quantifies the cost of the wrong choice.

7. Marketing: Attribution Model

The question: "We spend EUR 40K/month on Google Ads and EUR 15K on content. Google Ads shows 3x ROAS. Should we shift more budget to ads?"

Vanilla Claude said: A 3x ROAS on Google Ads is strong. If you can maintain this return at higher spend levels, shifting more budget to ads could accelerate growth. However, consider diminishing returns at higher spend levels and the long-term SEO benefits of content marketing.

Expert skill said: Do not shift budget based on last-click attribution from Google Ads. The 3x ROAS is almost certainly overstated because: (1) Google Ads claims credit for users who would have converted organically — run a brand keyword holdout test. Pause brand keywords for 2 weeks and measure the actual conversion drop. Most companies find 40-60% of "brand search" conversions happen anyway. (2) Your EUR 15K content spend likely assists the ads funnel. Check your multi-touch attribution. If content is the first touch for users who later convert via paid, your content ROAS is being attributed to ads. (3) At EUR 40K/month, check your impression share. If you are already at 80%+ impression share on high-intent keywords, additional spend goes to broader (lower-intent) terms at worse economics. Your actual test: Hold ad spend constant for 30 days while doubling content budget. Measure blended CAC, not channel-specific ROAS. If blended CAC improves, content is the better marginal investment.

Why it matters: Following the vanilla advice could mean burning EUR 20K+/month on incremental ad spend that cannibalizes organic conversions. The expert answer gives you a concrete test to run before making the allocation decision.

8. DevOps: Incident Response

The question: "Our API is returning 502 errors affecting 15% of requests. What's the triage process?"

Vanilla Claude said: A 502 Bad Gateway error typically means the upstream server is not responding correctly. Check your server logs, verify your load balancer configuration, and ensure your backend services are running. Consider scaling up your infrastructure if the issue is related to load.

Expert skill said: Immediate triage (first 5 minutes): (1) Check if the 502s correlate with specific endpoints or all endpoints. If specific, you likely have one failing service, not a systemic issue. (2) Check your load balancer health checks — are backends being marked unhealthy? If health checks pass but requests fail, you have an intermittent issue (connection pool exhaustion, memory pressure, or slow upstream dependencies). (3) Check request latency percentiles. If p99 spiked, you likely have a slow query or external API timeout that is exhausting your connection pool. (4) Check recent deployments. If a deploy happened in the last 2 hours, roll back first, investigate second. Diagnosis path by pattern: Consistent 502s = backend is down, check process/container status. Intermittent 502s = resource exhaustion, check connection pools, file descriptors, memory. 502s during traffic spikes = scaling limits, check autoscaling triggers and instance counts. 502s for specific users = session affinity or data-dependent code path, check error logs with user context. Communication: at 15% error rate, post to your status page within 10 minutes. Do not wait for root cause. "We are investigating elevated error rates" is better than silence.

Why it matters: "Check your logs" is not a triage process. A triage process tells you what to check, in what order, and what each finding means. The expert answer gets you from alert to action in a structured way instead of ad-hoc debugging.

9. Content Strategy: B2B Positioning

The question: "We're launching a developer tool for API testing. How do we differentiate from Postman?"

Vanilla Claude said: To differentiate from Postman, focus on your unique features, target a specific niche, improve developer experience, and build a strong community. Consider emphasizing performance, pricing, or specific integrations that Postman lacks.

Expert skill said: Do not try to be a better Postman. Postman has 30M+ users and a decade of network effects. Your differentiation paths: (1) Pick one workflow and own it. Postman is broad. If you do contract testing better than anyone (schema validation, breaking change detection, CI integration in under 5 minutes), you have a wedge. Developers adopt tools that solve a specific pain, not tools that are generally better. (2) Pricing as a feature. Postman's paid tiers gate collaboration features. If your collaboration is free and your paid tier is usage-based, you attract teams who hit Postman's paywall at exactly the point they are most locked in. (3) Go where Postman is weak: CLI-first workflows. Postman is GUI-centric. If your tool is a CLI that generates test suites from OpenAPI specs and runs in CI with zero config, you serve the GitOps crowd that Postman does not prioritize. (4) Your launch content should be comparison-oriented: "API contract testing: Postman vs. [You] — 5 tests in 2 minutes." Developers trust benchmarks, not feature lists. Ship the comparison before competitors do.

Why it matters: "Focus on your unique features" assumes you know what they are. The expert answer identifies specific strategic wedges with reasoning for why each one works against an incumbent.

10. HR: Employee Termination in Germany

The question: "We need to terminate a software developer in our Berlin office. They've been employed for 3 years. What's the process?"

Vanilla Claude said: In Germany, employee termination requires a notice period based on length of employment. For 3 years, the standard notice period is typically 1-3 months. Consult with a local employment lawyer to ensure compliance with German labor law.

Expert skill said: For a 3-year tenure, the statutory minimum notice period is 1 month to the end of a calendar month (BGB Section 622(2)). Check the employment contract — it may specify longer. Process requirements: (1) If you have 10+ employees (head count, not FTE), the Kuendigungsschutzgesetz (dismissal protection act) applies. You need a legally recognized reason: conduct-related (requires prior warning, typically 2 warnings for the same issue), person-related (long-term illness, loss of qualification), or business-related (redundancy with social selection criteria). Without a valid reason, the termination is void — the employee can sue within 3 weeks and will win reinstatement or settlement. (2) If you have a Betriebsrat (works council), you must consult them before issuing the termination. Failure to consult makes the termination void regardless of the reason. Consultation period: 1 week for ordinary termination. (3) Termination must be in writing (wet signature, not email, not DocuSign — BGB Section 623). An electronic termination is legally void. (4) Severance is not legally required for ordinary termination but is standard practice. Market rate in Germany: 0.5 monthly salaries per year of employment, so 1.5 months for 3 years. This is also the formula used by German labor courts (Arbeitsgericht) as the standard settlement amount. (5) Consider an Aufhebungsvertrag (mutual termination agreement) instead — faster, avoids litigation risk, but the employee must agree and should have 3 business days to consider.

Why it matters: Sending a termination via email voids it. Missing the works council consultation voids it. Not having a legally recognized reason means you lose in court. The vanilla answer's suggestion to "consult a lawyer" is correct but does not tell you that 4 of the 5 most common mistakes in German terminations are procedural — and all of them are avoidable if you know the process.

The Pattern

Across all ten questions, the failure mode is the same. Generic AI:

  • Gives you the textbook answer, not the practitioner answer
  • Sounds complete but omits the details that determine the outcome
  • Defaults to "consult an expert" instead of being the expert
  • Cannot tell you what it does not know

Expert skills close these gaps because they carry domain-specific frameworks, jurisdiction-aware knowledge, and structured reasoning that turns general intelligence into specific expertise. The model is the same. The difference is what it is told to focus on, what standards to apply, and what patterns to look for.

You do not need AI that sounds smart. You need AI that knows what it is talking about.

Try the skills mentioned in this post

Browse Skills