Quality Skills¶
Context: Six automated analysis skills audit every chapter across voice, citations, research, links, openings, and vocabulary. The
review-chapterskill runs all six and produces a unified dashboard with a weighted health score.
Manual quality review doesn't scale. You can read a chapter and feel that something is off, but you can't reliably detect that the same opening pattern appears in 5 of 7 sections, or that citation density drops by 40% in the back half of the chapter, or that a concept mentioned in Chapter 4 has zero cross-references from Chapter 9. Automated skills catch the patterns humans miss.
6 Analysis Skills¶
The review-chapter skill orchestrates all six and produces a single dashboard:
| Skill | What It Checks | Score Weight |
|---|---|---|
check-voice |
Kill list violations, hedging, AI patterns, approved markers | 25% |
check-citations |
Citation density, uncited stats, citation format | 20% |
map-research |
Research coverage gaps, orphan research, cross-chapter opportunities | 20% |
audit-links |
Cross-chapter links, island sections, hub sections | 15% |
audit-openings |
Opening pattern variety, repetitive starts | 10% |
analyze-terms |
Overused terms, synonym suggestions | 10% |
The weights reflect priority. Voice is 25% because if the book doesn't sound like one person wrote it, nothing else matters. Citations and research share 40% because credibility is what separates a serious book from AI-generated filler. Links, openings, and terms handle the polish layer.
Voice Scoring (check-voice)¶
This skill is the most nuanced. It starts with a base score of 70 and adjusts based on what it finds:
| Pattern Type | Score Impact | Max Contribution |
|---|---|---|
| Approved voice patterns ("Here's the thing," "What does X look like?") | +1 each | +15 |
| Personal markers (specific opinions, I've-seen-this moments) | +2 each | +10 |
| Kill list violations ("important to note," "let's delve") | -3 each | No limit |
| Hedging patterns ("somewhat," "arguably," "it could be said") | -2 each | No limit |
| AI signal phrases ("leverage," "comprehensive," "robust") | -2 each | No limit |
Target score: 85+. In practice, first drafts land around 72-78. After the de-AI editing pass, they reach 85-92. The scoring is intentionally punitive on kill list violations because those are the phrases that make readers think "an AI wrote this" -- and once that trust breaks, it doesn't come back.
The skill scans every section file, tallies the patterns, and reports violations with line numbers. It doesn't fix anything. It surfaces problems for the reviewer or editor to address.
Citation Density (check-citations)¶
The benchmark: 1 citation per 150 words. The actual book averaged 1 per 105 words, which is higher than most business non-fiction. That density is a direct result of the research-first pipeline making citations cheap.
What the skill checks:
- Uncited statistical claims -- any sentence with a percentage, dollar amount, multiplier, or specific number that lacks a footnote gets flagged
- Under-cited sections -- fewer than 6 citations in a 1,200-word section triggers a warning
- Over-cited sections -- more than 25 citations suggests the section is a research dump rather than synthesized prose
- Format compliance -- footnotes use named keys, references section exists, internal research block is present
The uncited-stats detection isn't perfect. It catches "grew 300%" without a footnote but can't distinguish between the author's analysis ("roughly half the market") and a factual claim that needs sourcing. Human review handles the gray area.
Research Coverage (map-research)¶
This skill compares what's available in the research files against what's used in the draft. It surfaces three things:
- Coverage gaps -- research files with HIGH credibility stats that don't appear anywhere in the chapter
- Orphan research -- completed research that maps to no section (usually from prompt generation that was too broad)
- Cross-chapter opportunities -- research from Chapter 4 that would strengthen an argument in Chapter 9
The cross-chapter detection is the most valuable output. In a 12-chapter book, arguments connect across parts. A stat about AI infrastructure costs from Chapter 4 might be exactly what Chapter 10's operations section needs. Without automated detection, these connections only happen if the author remembers they exist.
Opening Variety (audit-openings)¶
This skill classifies how each section opens and flags repetition:
| Opening Type | Example |
|---|---|
| Question | "What does AI-first actually mean?" |
| Statistic | "Harvey reached $100M ARR in three years." |
| Story | "When Uber hit 2,000 microservices..." |
| Definition | "An AI gateway is the control plane..." |
| Company example | "Figma didn't add AI as a feature." |
| Contrast | "Here's what most people get wrong about..." |
The rule: no pattern should appear more than twice per chapter. When Chapter 3 came back with 5 of 7 sections starting with "Here's the thing," this skill caught it in 10 seconds. A human reviewer might not -- especially after reading 40 sections across multiple chapters.
Health Score Dashboard¶
The review-chapter skill aggregates all six scores into a weighted health score on a 0-100 scale. Reports are saved to reports/<Draft>/chapter-reviews/ and include:
- Overall health score with trend tracking (shows delta from last review)
- Metric breakdown table with individual scores
- Top 10 priority issues ranked by impact
- Quick action commands -- which scripts to run to fix each issue
# Full review of chapter 8
python .claude/skills/review-chapter/scripts/review_chapter.py --draft "Draft 1" 8
# Quick scores only (no detailed breakdown)
python .claude/skills/review-chapter/scripts/review_chapter.py --draft "Draft 1" 8 --quick
# Compare chapter against book average
python .claude/skills/review-chapter/scripts/review_chapter.py --draft "Draft 1" 8 --compare
The --compare flag is useful for spotting outlier chapters. If 11 chapters score 82-88 and one scores 71, you know where to focus editorial time.
What Automated Quality Can't Do¶
These skills catch patterns. They don't catch judgment calls. They'll flag that a section is under-cited but won't know whether the section is the author's personal analysis (where citations are inappropriate) or a factual claim that needs sourcing. They'll detect AI-pattern phrases but can't distinguish between "leverage" used as corporate jargon (bad) and "leverage" used to describe an actual mechanical lever (fine).
The quality pipeline reduces human review time by catching the mechanical stuff. It doesn't replace human review. Both are required. Neither is optional.
Deep dives: Voice Drift Prevention | Review Philosophy