Skip to content

Chapter 3: The AI Landscape -- Resources

Curated resources for deeper exploration of topics covered in this chapter.

Frameworks from This Chapter

  • Foundation Models Landscape -- The 4-layer AI stack (Foundation Models, Providers, Aggregators, Applications) and model capability comparisons.
  • 6 Questions Before Choosing a Model -- The sequential decision framework: use case, latency tolerance, compliance, cost structure, explainability, and switching tolerance.

Tools & Platforms

Foundation Models

  • Claude (Anthropic) -- 97.8% security compliance for code generation; 200K token context windows; 90% cache hit discount.
  • GPT (OpenAI) -- 85.4% multimodal MMMU benchmark leader; SOC 2, HIPAA with signed BAAs.
  • Gemini (Google) -- 87.6% on Video-MMMU; 2M token context windows; 99.9% SLA commitments.
  • DeepSeek -- $0.07/million tokens with cache hits; 671B parameter Mixture of Experts model trained for ~$6M.
  • Llama (Meta) -- Open-weight model; Shopify runs 40-60 million LLaVA inferences per day using fine-tuned Llama.
  • Mistral -- Leads time-to-first-token at 0.30 seconds; open-weight models for European regulatory environments.

Providers & Cloud Wrappers

  • OpenAI API -- Ecosystem leader; SOC 2, HIPAA with BAAs; Batch API offers 50% discount.
  • Anthropic API -- Prompt caching at 90% discount; safety-focused; rate limit considerations.
  • Google Vertex AI -- 99.9% SLA with financial credits; FedRAMP High authorization.
  • Azure OpenAI Service -- Enterprise wrapper with Azure compliance and private networking.
  • AWS Bedrock -- Multi-model access through AWS infrastructure; pre-negotiated cloud discounts.
  • together.ai -- Third-party host for open-source models with different cost/performance trade-offs.
  • Fireworks AI -- Model serving platform; helped Notion achieve 350ms latency.
  • Replicate -- Open-source model hosting with pay-per-use pricing.

Aggregators & Routing

  • OpenRouter -- Single API to 100+ models; 5% markup; :nitro for speed, :floor for lowest price.
  • LiteLLM -- Open-source self-hosted gateway; 3ms P50 / 17ms P90 latency overhead.
  • Portkey -- Enterprise LLM gateway starting at $49/month; 250+ LLMs; semantic caching and audit trails.
  • RouteLLM -- Research showing 85% cost reduction while maintaining 90-95% of GPT-4 quality.
  • Helicone -- LLM observability platform; built-in caching can cut costs 20-30%.
  • Langfuse -- Open-source LLM observability and evaluation platform.

Fine-Tuning & Evaluation

  • LoRA (Low-Rank Adaptation) -- Reduces GPU memory requirements by up to 3x for fine-tuning.
  • PromptLayer -- Prompt versioning and management as code.
  • Maxim -- Prompt versioning and AI evaluation platform.
  • Vespa -- Distributed search and ranking engine; powers Perplexity's 200 billion URL index.

Further Reading

Research & Data

Community & Learning

Provider Comparison Summary

Provider Strength Key SLA Cache Discount
OpenAI Ecosystem depth, brand ~99.3% uptime 50% automatic
Anthropic Safety, long context Rate limit tiers 90% on hits (25% write premium)
Google Vertex Enterprise SLA 99.9% with financial credits Varies
DeepSeek Cost ($0.07/M tokens) No enterprise SLA N/A
Meta (Llama) Open weights, self-host Self-managed N/A