Skip to content

Building for Evolution

Stripe's 13-year backwards compatibility record isn't magic1. It's architecture. All internal code runs on the latest version. Transformation modules walk responses "back in time" to whatever version each customer expects. A hundred breaking changes absorbed. Zero forced migrations.

That's building for evolution: systems that absorb change. The companies that capture AI improvements fastest aren't rewriting code—they're updating configuration.

The Strangler Fig Pattern

The strangler fig tree grows around its host, gradually replacing it while both coexist. For AI modernization:

Phase 1: Wrap. New AI capability intercepts requests. Legacy still handles work. Risk: minimal.

Phase 2: Route. Traffic shifts gradually—1%, then 10%, then 50%. A/B test against legacy. Risk: controlled.

Phase 3: Replace. Legacy retires when new system proves stable. Risk at cutover: minimal.

Phase 4: Repeat. Next capability follows the same playbook.

flowchart LR
    subgraph P1["Phase 1: Wrap"]
        direction TB
        W1[New AI Capability] --> W2[Intercepts Requests]
        W2 --> W3[Legacy Handles Work]
    end

    subgraph P2["Phase 2: Route"]
        direction TB
        R1[Traffic Split] --> R2["1% → 10% → 50%"]
        R2 --> R3[A/B Test vs Legacy]
    end

    subgraph P3["Phase 3: Replace"]
        direction TB
        X1[New System Proven] --> X2[Legacy Retires]
    end

    subgraph P4["Phase 4: Repeat"]
        direction TB
        Y1[Next Capability] --> Y2[Same Playbook]
    end

    P1 --> P2 --> P3 --> P4
    P4 -.->|"Continuous"| P1

    style P1 fill:#1e6fa5,stroke:#155a85
    style P2 fill:#c77d0a,stroke:#a06508
    style P3 fill:#1a8a52,stroke:#14693e
    style P4 fill:#7345b0,stroke:#5b3590

Salesforce used this approach to integrate Bedrock Custom Model Import while maintaining existing SageMaker endpoints2—30% faster deployments, 40% cost savings, full backward compatibility. Big-bang rewrites carry catastrophic risk. That's a bet most AI systems lose.

Model Abstraction: The Foundation

Without abstraction: SDK calls scattered, vendor lock-in, zero leverage.

With one: changing models means updating a routing configuration. Your application code doesn't know whether it's talking to GPT-4, Claude, or Gemini.

The Vercel AI SDK demonstrates this pattern. A single generateText call works identically regardless of which provider backs it3. Switching providers means changing a model string parameter—no rewriting application code.

AWS didn't force SageMaker customers to migrate when they launched Bedrock. Models from SageMaker JumpStart can register with Bedrock through a simple UI workflow while maintaining existing infrastructure4. The pattern: abstraction gives you options—performance, pricing, and capabilities without rewrites.

flowchart TB
    subgraph APP["Application Layer"]
        A1[Feature A]
        A2[Feature B]
        A3[Feature C]
    end

    subgraph ABSTRACTION["Model Abstraction Layer"]
        direction LR
        AB["generateText()<br/>Unified Interface"]
    end

    subgraph PROVIDERS["Provider Layer"]
        direction LR
        P1["GPT-4"]
        P2["Claude"]
        P3["Gemini"]
    end

    A1 & A2 & A3 --> AB
    AB --> P1 & P2 & P3

    CONFIG["Config Change"] -.->|"Switch Provider"| AB

    style ABSTRACTION fill:#1a8a52,stroke:#14693e
    style CONFIG fill:#c77d0a,stroke:#a06508

The 4 Patterns That Enable Evolution

1. Interface Contracts

Each module exposes a well-defined API surface. Clear inputs, clear outputs, machine-readable specs. Design for agents first—if an AI agent can use your interface, humans can too. Stripe's date-based versions like 2024-05-24 with automatic pinning exemplify this1.

Skip this and changes cascade. One team's model upgrade breaks three others.

2. Feature Flags for AI

Feature flags decouple deployment from release. For AI: gradual rollouts (1% to 10% to 100%)5, targeting by user segment, kill switches for incidents, A/B tests with statistical rigor.

Microsoft Azure implements shadow mode: new models process requests but don't serve responses, logging predictions for offline comparison6. Skip this and every deployment is a bet-the-company moment.

3. Versioned APIs with Graceful Deprecation

Anthropic sets the standard: migrating from Claude Sonnet 3.5 to 4.0 requires updating one model ID7. Zero breaking changes. OpenAI's stated policy provides 6-12 month deprecation windows with email notifications8.

What happens in practice: OpenAI's function calling changes in October 2024 broke production systems with minimal warning9. Models returned function responses as regular messages. The gap between stated policy and operational reality is why abstraction layers matter.

4. Shadow Testing and Gradual Rollouts

Microsoft's pattern: shadow mode to 5% traffic to 50% to 100%, with validation metrics at each gate6. Define success criteria before starting—shadow mode doubles inference costs, so you need clear thresholds that tell you when to promote and when to abandon.

flowchart LR
    S["Shadow Mode<br/>Runs but doesn't serve"] --> T1["5% Traffic<br/>Validate metrics"]
    T1 --> T2["50% Traffic<br/>Statistical confidence"]
    T2 --> T3["100% Traffic<br/>Full rollout"]

    T1 -.->|"Metrics fail"| ABORT["Abandon"]
    T2 -.->|"Metrics fail"| ABORT

    style S fill:#1e6fa5,stroke:#155a85
    style T1 fill:#c77d0a,stroke:#a06508
    style T2 fill:#c77d0a,stroke:#a06508
    style T3 fill:#1a8a52,stroke:#14693e
    style ABORT fill:#c03030,stroke:#9a2020

Skip this and you launch blind. First feedback comes from angry customers, not dashboards.

The Yirifi Principle

"The architecture that works today is designed to absorb the capabilities of tomorrow."

Every component should answer one question: what happens when the underlying AI improves? Configuration change and rollout monitoring? That's evolution-ready. Major rewrite required? That's a design flaw you'll pay for repeatedly.


References


← Previous: Why Modularity Matters | Chapter Overview | Next: Monitoring Emerging Technologies →