Chapter 3: The AI Landscape -- Resources¶
Curated resources for deeper exploration of topics covered in this chapter.
Frameworks from This Chapter¶
- Foundation Models Landscape -- The 4-layer AI stack (Foundation Models, Providers, Aggregators, Applications) and model capability comparisons.
- 6 Questions Before Choosing a Model -- The sequential decision framework: use case, latency tolerance, compliance, cost structure, explainability, and switching tolerance.
Tools & Platforms¶
Foundation Models¶
- Claude (Anthropic) -- 97.8% security compliance for code generation; 200K token context windows; 90% cache hit discount.
- GPT (OpenAI) -- 85.4% multimodal MMMU benchmark leader; SOC 2, HIPAA with signed BAAs.
- Gemini (Google) -- 87.6% on Video-MMMU; 2M token context windows; 99.9% SLA commitments.
- DeepSeek -- $0.07/million tokens with cache hits; 671B parameter Mixture of Experts model trained for ~$6M.
- Llama (Meta) -- Open-weight model; Shopify runs 40-60 million LLaVA inferences per day using fine-tuned Llama.
- Mistral -- Leads time-to-first-token at 0.30 seconds; open-weight models for European regulatory environments.
Providers & Cloud Wrappers¶
- OpenAI API -- Ecosystem leader; SOC 2, HIPAA with BAAs; Batch API offers 50% discount.
- Anthropic API -- Prompt caching at 90% discount; safety-focused; rate limit considerations.
- Google Vertex AI -- 99.9% SLA with financial credits; FedRAMP High authorization.
- Azure OpenAI Service -- Enterprise wrapper with Azure compliance and private networking.
- AWS Bedrock -- Multi-model access through AWS infrastructure; pre-negotiated cloud discounts.
- together.ai -- Third-party host for open-source models with different cost/performance trade-offs.
- Fireworks AI -- Model serving platform; helped Notion achieve 350ms latency.
- Replicate -- Open-source model hosting with pay-per-use pricing.
Aggregators & Routing¶
- OpenRouter -- Single API to 100+ models; 5% markup;
:nitrofor speed,:floorfor lowest price. - LiteLLM -- Open-source self-hosted gateway; 3ms P50 / 17ms P90 latency overhead.
- Portkey -- Enterprise LLM gateway starting at $49/month; 250+ LLMs; semantic caching and audit trails.
- RouteLLM -- Research showing 85% cost reduction while maintaining 90-95% of GPT-4 quality.
- Helicone -- LLM observability platform; built-in caching can cut costs 20-30%.
- Langfuse -- Open-source LLM observability and evaluation platform.
Fine-Tuning & Evaluation¶
- LoRA (Low-Rank Adaptation) -- Reduces GPU memory requirements by up to 3x for fine-tuning.
- PromptLayer -- Prompt versioning and management as code.
- Maxim -- Prompt versioning and AI evaluation platform.
- Vespa -- Distributed search and ranking engine; powers Perplexity's 200 billion URL index.
Further Reading¶
- How Perplexity Built an AI Google Competitor (ByteByteGo) -- Deep dive into Perplexity's multi-model routing, Vespa search stack, and "smallest viable model" approach.
- Stanford HAI AI Index Report 2025 -- Performance gap between top and 10th-ranked model shrank from 11.9% to 5.4%.
- Menlo Ventures: 2025 State of Generative AI in the Enterprise -- Enterprise spending on generative AI grew 3.2x to $37 billion; 76% of use cases now purchased.
- Hacker News Discussion on LangChain Abstractions -- Developer backlash against over-abstraction; the emergence of LangGraph as a response.
- Expanding Harvey's Model Offerings -- Harvey's multi-model approach: different models excel at different legal subtasks.
- Choosing the Right Model in Cursor -- Practical guide to multi-model usage in AI coding tools.
- Claude Code vs Cursor (Qodo) -- Replit's Head of AI on why Claude is "by far the best model" for code generation.
Research & Data¶
- LMSYS Chatbot Arena -- Community-driven model evaluation; Elo differences under 50 points are "basically a toss-up."
- Lost in the Middle (arXiv) -- Research on performance degradation when relevant information sits in the middle of long contexts.
- Cisco: Security Evaluation of DeepSeek -- 100% attack success rate on DeepSeek-R1; fails to block any harmful prompts.
- Qualys DeepSeek Security Assessment -- DeepSeek generates insecure code at 4x the rate of competitors.
- Bain & Company: DeepSeek Analysis -- DeepSeek V3 trained for ~$6M versus estimated $100M+ for GPT-4.
- IDC: The Future of AI is Model Routing -- By 2028, 70% of top AI-driven enterprises will use multi-model architectures.
- CFM Case Study (HuggingFace) -- Capital Fund Management achieved solutions 80x cheaper than large LLMs with LoRA fine-tuning.
- Glean AI Evaluator -- How Glean measures context relevance and recall rates for enterprise search evaluation.
Community & Learning¶
- LMSYS Chatbot Arena Leaderboard -- Community-driven, open platform for evaluating LLMs through human preference.
- Hugging Face -- Open-source AI community; hosts models, datasets, and case studies like CFM's LoRA fine-tuning.
- Data-Centric AI (datacentricai.org) -- Community and resources focused on improving AI through data quality rather than model architecture.
Provider Comparison Summary¶
| Provider | Strength | Key SLA | Cache Discount |
|---|---|---|---|
| OpenAI | Ecosystem depth, brand | ~99.3% uptime | 50% automatic |
| Anthropic | Safety, long context | Rate limit tiers | 90% on hits (25% write premium) |
| Google Vertex | Enterprise SLA | 99.9% with financial credits | Varies |
| DeepSeek | Cost ($0.07/M tokens) | No enterprise SLA | N/A |
| Meta (Llama) | Open weights, self-host | Self-managed | N/A |