Data as Product¶

In traditional software companies, data is a byproduct. Users generate logs, analytics teams analyze them, maybe product teams get a quarterly report. The data serves the product.

AI-first companies invert this relationship. Data is the product. Every user interaction generates training signal. Every correction teaches. Every edge case reveals a gap that, once filled, benefits every future user. The product serves the data as much as data serves the product.

This isn't a subtle distinction. It's the difference between competing on features and competing on compound learning.

The Data Flywheel¶

Jensen Huang put it directly at Snowflake Summit in 2024: "AI gives every company an opportunity to turn its processes into a data flywheel." His advice? Companies need to "take all the most important processes they do, capture them in a data flywheel and turn that into the company's AI to drive that flywheel even further"¹.

The loop is simple: usage generates signal, signal improves the model, better models drive more usage. The longer it spins, the harder you become to catch.

flowchart LR
    A[Users Interact] --> B[Signal Captured]
    B --> C[Model Improves]
    C --> D[Better Experience]
    D --> A

    style A fill:#1e6fa5
    style B fill:#c77d0a
    style C fill:#1a8a52
    style D fill:#c03030

NVIDIA: The Flywheel That Designs Chips¶

NVIDIA's flywheel is almost unfair. They've accumulated decades of chip design data—design history, test results, performance characteristics, failure modes. This wasn't strategic foresight. It was operational necessity: you can't design chips without tracking what works.

Now that advantage compounds. NVIDIA uses LLMs powered by proprietary data no competitor can access—design decisions, test results, performance optimizations discovered through iteration. The result? AI agents that assist in designing new GPUs using accumulated knowledge².

AMD or Intel could build similar tools. They can't replicate NVIDIA's training data.

Klarna: Customer Service That Teaches Itself¶

Klarna's AI assistant handles customer service in 35+ languages. Before the AI: average customer response time was 11 minutes. After: 2 minutes³. But the real story isn't the speed improvement. It's that every conversation makes the system smarter.

Klarna isn't just saving money on support costs. They're building an asset—a customer service capability that gets better with scale rather than more expensive.

What Makes Flywheels Work (And Fail)¶

Not every company that collects data builds a flywheel. The data has to actually flow through the loop, not just accumulate in a warehouse. Three requirements:

The data must be structured for learning. Spotify doesn't just log that you skipped a song. They capture when you skipped (30 seconds in = strong negative signal), what you skipped to, what context you were in. The company processes 500 billion daily events from 678 million users—but the structure of that data matters more than its volume⁴.

The feedback loop must be fast. Duolingo's Birdbrain AI processes 1.25 billion exercises daily with real-time feedback loops—not weekly batch jobs⁵. Continuous improvement, exercise by exercise.

The improvement must be visible to users. Flywheels stall when users don't experience the benefit. If your AI gets 5% better but users can't tell, they have no reason to keep generating signal.

The risk: filter bubbles. Recommendation models naturally create "rich-get-richer" feedback loops where popular items dominate. Spotify's solution? A dedicated "Algorithmic Responsibility" research team to combat these effects⁶. The flywheel can spin in unhelpful directions if you don't actively steer it.

Companies with mature data flywheels report over 98% savings in inference costs without compromising accuracy⁷. That's not incremental—it's the kind of cost structure that makes previously impossible products viable.

The Question Every AI-First Company Must Answer¶

The data flywheel isn't optional. It's the mechanism through which AI-first companies compound their advantage.

The question you need to answer: What data does our product generate, and how does that data make our product better?

If you can't articulate the loop clearly—usage generates X signal, which improves Y capability, which drives Z outcome—you don't have a flywheel. You have data accumulation without compound learning. Chapter 9 dives deep into the architecture patterns that make flywheels sustainable.

References¶

← Previous: The Build vs Buy Calculus | Chapter Overview | Next: Iteration Speed and Learning Loops →

Jensen Huang at Snowflake Summit, June 2024 — snowflake.com ↩
NVIDIA AI for Chip Design — nvidia.com ↩
Klarna AI Assistant Performance — klarna.com ↩
Spotify Engineering on Recommendation Data — engineering.atspotify.com ↩
Duolingo Birdbrain Technical Overview — duolingo.com ↩
Spotify Algorithmic Responsibility Research — research.atspotify.com ↩
NVIDIA Data Flywheel Research — nvidia.com ↩