Citation Auditor Template¶

Pseudocode and logic for building a citation audit script that checks density, finds uncited claims, and detects common footnote problems.

Purpose¶

Audit citation density across your manuscript and flag statistical claims that lack footnotes.

What to Detect¶

Citation References¶

Two patterns to match:

Pattern	Location	Example
`[^tag-name]`	Body text (inline reference)	`reached $100M ARR[^harvey-arr]`
`[^tag-name]: ...`	References section (definition)	`[^harvey-arr]: Harvey 2024...`

Uncited Statistical Claims¶

Regex patterns that should typically have a citation nearby:

Percentages:        \d+(\.\d+)?%
Dollar amounts:     \$[\d.,]+\s*(million|billion|M|B|K)?
Large numbers:      \d{1,3}(,\d{3})+
Multipliers:        \d+x\s+(faster|slower|more|better|cheaper)
Ratios:             \d+\s+out of\s+\d+
Growth phrases:     doubled|tripled|grew by
Time specifics:     in\s+\d{4}|since\s+\d{4}

Core Logic (Pseudocode)¶

def audit_citations(section_path):
    content = read_file(section_path)
    body_text = extract_body(content)  # exclude frontmatter, code blocks

    # Count citation references in body
    inline_refs = find_all(r'\[\^[\w-]+\]', body_text)

    # Count citation definitions in references
    definitions = find_all(r'^\[\^[\w-]+\]:', content)

    # Count words (excluding metadata)
    word_count = count_words(section_path)

    # Calculate density
    density = len(inline_refs) / (word_count / 1000)

    # Find uncited stats
    stats = find_all(STAT_PATTERNS, body_text)
    uncited = [s for s in stats if no_citation_within(s, radius=50_chars)]

    # Find orphaned footnotes
    orphaned_refs = [r for r in inline_refs if r not in definitions]
    orphaned_defs = [d for d in definitions if d not in inline_refs]

    # Find duplicate URLs
    urls = extract_urls_from_definitions(content)
    duplicates = find_duplicate_urls(urls)

    return {
        'citations': len(inline_refs),
        'words': word_count,
        'density_per_1k': density,
        'benchmark': word_count / [YOUR WORDS PER CITATION],
        'uncited_claims': uncited,
        'orphaned_refs': orphaned_refs,
        'orphaned_defs': orphaned_defs,
        'duplicate_urls': duplicates
    }

Output Format¶

Section 6.1: [Section Title]
  Citations: 8
  Words: 1,187
  Density: 6.7 per 1,000 words
  Benchmark: 7.9 (1 per 150 words)
  Status: BELOW BENCHMARK

  Uncited claims:
    Line 45: "85% of enterprises..." -- needs citation
    Line 78: "$4.2 billion market..." -- needs citation

  Orphaned references:
    [^missing-def] -- referenced but never defined

  Duplicate URLs:
    https://example.com/report -- used by [^tag-a] and [^tag-b]

Benchmark Targets¶

Fill in based on your citation density goals:

Section Length	Target Citations	Density
800 words	[N]	[N] per 1K
1,200 words	[N]	[N] per 1K
1,800 words	[N]	[N] per 1K

Key Features to Build¶

Citation counting per section
Density calculation against your benchmark
Uncited statistical claim detection with line numbers
Duplicate URL detection (same URL, different footnote tags)
Orphaned footnote detection (both directions)
Per-chapter summary view
--fix mode for auto-standardization of duplicates
--dry-run for previewing fixes before applying

Dependencies¶

Python 3.8+
re (standard library) -- regex matching
pyyaml -- frontmatter parsing
rich (optional) -- colored terminal output