Skip to content

Research Architecture

Here's what most people get wrong about research for AI-assisted writing: they treat it as something you do during writing. You're mid-paragraph, you need a stat, you open a browser tab, you hunt for 10 minutes, you find something okay, you paste it in, you lose your thread. Multiply that by 81 sections and you've spent more time researching than writing.

The fix: research becomes a pipeline that runs before writing starts. By the time the writer agent opens a section, citation-ready stats, quotes, and company examples are already waiting. The writing session becomes about argument and voice, not evidence hunting.


The Pipeline

Four phases, each feeding the next:

flowchart LR
    subgraph PREP["Preparation"]
        WR[Web Research\nPre-Search] --> PP[Prompt\nDesign]
    end
    subgraph EXEC["Execution"]
        PP --> PA[Perplexity\nAutomation]
        PA --> RA[Raw\nAnswers]
    end
    subgraph PROCESS["Processing"]
        RA --> SY[Synthesis\nExtraction]
        SY --> CI[Citation-Ready\nContent]
    end
    subgraph USE["Integration"]
        CI --> WA[Writer\nAgent]
        CI --> RV[Reviewer\nAgent]
    end

Preparation maps the landscape before you write a single prompt. Execution runs 180+ Perplexity searches automatically. Processing extracts the pieces writers actually need -- stats with credibility scores, quotes with confidence levels, company examples with context. Integration feeds it all into the writer and reviewer agents through 9 purpose-built extraction scripts.


Research Folder Structure

Every chapter gets the same layout. The structure is the workflow:

research/Chapter_XX/
├── _index.md              # Chapter research map and status dashboard
├── 00_overview.md         # Synthesized overview of the topic landscape
├── web_research/          # Pre-research web searches (Phase 1)
│   ├── 00_landscape.md
│   ├── s_X.X_topic.md
│   ├── company_name.md
│   └── counter_arguments.md
├── prompts/               # Perplexity prompts (Phase 2)
│   ├── s_X.X_topic/
│   │   ├── 01_subtopic.md
│   │   └── 02_subtopic.md
│   └── case_studies/
├── answers/               # Perplexity responses (mirrors prompts/)
│   ├── s_X.X_topic/
│   │   ├── 01_subtopic.md
│   │   └── 02_subtopic.md
│   └── case_studies/
└── synthesis/             # Extracted stats, quotes, frameworks
    ├── s_X.X_topic.md
    └── s_X.Y_topic.md

The answers/ folder mirrors prompts/ exactly. Same subfolder names, same file names. When you run a prompt, the answer lands in the matching location. No hunting.

The _index.md acts as a status dashboard -- which sections have prompts, which have answers, which have synthesis. At a glance, you know where research gaps live.


Two-Phase Prompt Design

This is the highest-leverage decision in the entire research pipeline.

Phase 1: Pre-Research. Before writing any Perplexity prompts, you run web searches to map the current landscape. What companies are relevant right now? What data exists? What terms are practitioners using? Results save to web_research/. This takes 20-30 minutes per chapter and prevents a specific failure mode: writing prompts based on what you assume exists rather than what actually does. Without pre-research, you get prompts that return thin results because you're asking about outdated companies or non-existent datasets.

Phase 2: Prompt Writing. Now you write prompts informed by what you found. You know which companies have recent data. You know which stats are out there. You know what the counter-arguments look like. The prompts are sharper -- 30-45 lines each, focused on specific gaps rather than broad surveys. Focused prompts beat broad ones every time. A prompt asking "What are the economics of AI infrastructure?" returns generic analysis. A prompt asking "What does Vercel's serverless AI gateway cost at 100M+ queries/month, and how does that compare to self-hosted alternatives?" returns something you can cite.


Scale

The book used 180+ prompts across 12 chapters:

  • ~15 prompts per chapter on average
  • Every section gets at least 1 dedicated prompt
  • Anchor examples (key companies, case studies) get their own dedicated prompts
  • Counter-argument prompts for every chapter to ensure balanced coverage

Prompt Types

Different research questions need different prompt structures:

Type Purpose Length
Data/Statistics Find exact numbers with methodology and source Shortest
Company Deep-Dive Timeline, tech approach, competitive advantage, recent developments Medium
Comparison Two approaches side-by-side with specific tradeoffs Medium
Counter-Argument Strongest arguments against the chapter's thesis Medium
Practitioner Quote What founders and CTOs say (not analysts or journalists) Short
Gap-Filling Target specific research holes identified during writing Variable

The practitioner quote prompts are worth calling out. Early research returned quotes from analysts and journalists -- useful for context, less useful for a book targeting builders. Dedicated prompts that explicitly ask for founder and CTO perspectives produce quotes that land differently with the target audience.


What This Changes

The research-first approach flipped the economics of evidence. When stats arrive pre-formatted with footnote keys, you use them. When you have to manually hunt down sources mid-sentence, you don't. The book hit 775 citations -- one per 105 words -- not because citations were a goal, but because the pipeline made citing cheap.

The alternative -- writing first, researching later -- produces opinion-heavy, citation-light drafts. The enterprise audience notices.


Deep dives: Perplexity Automation | Synthesis and Extraction | Citation Management