Checklist: Building Your Data Strategy¶

Use this checklist when designing a data strategy for an AI-first product, evaluating an existing data pipeline, or assessing whether your data creates a defensible competitive advantage. It draws from the six data strategy failure patterns, the data flywheel framework, and the data moats assessment from Chapter 9.

Data Collection¶

You have validated product-market fit before investing in data infrastructure
Define collection targets by what improves the model, not by what's easiest to capture
Capture signals from production systems and real user interactions, not just synthetic or test data
You have identified the specific edge cases your collection pipeline needs to surface (e.g., Tesla's 0.01% automatic edge case detection)
Internal users provide continuous usage data to seed the system before external launch
Collection covers all five flywheel components: collection, storage, analysis, application, and feedback
You know the point at which adding more data stops improving model performance

Data Quality¶

Data accuracy is measured and tracked over time (U.S. average declined from 63.5% to 26.6% between 2021-2024)
Automated checks detect duplicates, outliers, and missing values in incoming data
Schema changes (missing columns, altered formats) trigger alerts, not silent failures
Data drift detection monitors shifts in input distributions
Model performance is tracked across segments over time, not just in aggregate
Your team spends more time training than cleaning---if not, collection standards need tightening
AI-generated content is flagged and excluded from training sets to prevent model collapse

Flywheel Design¶

You can map your data flow through all five components: Collection, Storage, Analysis, Application, Feedback
The break point in your flywheel is identified (typically between storage-analysis or analysis-application)
Your system has network learning, not just individual learning (deleting one user's data would affect other users' experience)
Insights from analysis translate into shipped product improvements, not slide decks
Feedback loops measure whether improvements actually generate more and better data
Deployment velocity is measured---you can ship, measure, and iterate within days, not months
Cold start strategy is defined: expert seeding, targeting complexity over volume, or building the loop architecture before data arrives
You have tested whether your flywheel compounds over time or plateaus after initial gains

Moat Assessment¶

Infrastructure¶

Infrastructure complexity matches your current stage, not your aspirational scale
You can name the specific bottleneck each infrastructure component solves
Teams don't spend days or weeks configuring infrastructure for each new workload
Multi-provider strategies with fallback options are in place before reaching production scale
Systems can switch between AI providers or degrade gracefully during outages
Data is accessible across teams, not siloed in department-specific stores
Unit economics are viable: margins are above 40% (AI wrappers average 25-60% vs. traditional SaaS at 70-90%)

Governance¶

Data collection complies with applicable privacy regulations (EU AI Act, industry-specific requirements)
Consent and data usage terms are clear to users and legally reviewed
Audit trails exist for how data flows through the system and into model training
Access controls define who can read, write, and delete training data
Data retention and deletion policies are documented and enforced
Governance architecture anticipates regulatory tightening rather than reacting to it
Compliance capabilities are treated as competitive advantage in enterprise procurement, not just a cost center

Source frameworks: - The 6 Data Strategy Mistakes That Stall Flywheels - Building Data Flywheels - Data Moats: What's Defensible vs. Replicable

Full chapter: Chapter 9: Data Strategy