Chapter 3: The AI Landscape -- Resources¶

Curated resources for deeper exploration of topics covered in this chapter.

Frameworks from This Chapter¶

Foundation Models Landscape -- The 4-layer AI stack (Foundation Models, Providers, Aggregators, Applications) and model capability comparisons.
6 Questions Before Choosing a Model -- The sequential decision framework: use case, latency tolerance, compliance, cost structure, explainability, and switching tolerance.

Claude (Anthropic) -- 97.8% security compliance for code generation; 200K token context windows; 90% cache hit discount.
GPT (OpenAI) -- 85.4% multimodal MMMU benchmark leader; SOC 2, HIPAA with signed BAAs.
Gemini (Google) -- 87.6% on Video-MMMU; 2M token context windows; 99.9% SLA commitments.
DeepSeek -- $0.07/million tokens with cache hits; 671B parameter Mixture of Experts model trained for ~$6M.
Llama (Meta) -- Open-weight model; Shopify runs 40-60 million LLaVA inferences per day using fine-tuned Llama.
Mistral -- Leads time-to-first-token at 0.30 seconds; open-weight models for European regulatory environments.

OpenAI API -- Ecosystem leader; SOC 2, HIPAA with BAAs; Batch API offers 50% discount.
Anthropic API -- Prompt caching at 90% discount; safety-focused; rate limit considerations.
Google Vertex AI -- 99.9% SLA with financial credits; FedRAMP High authorization.
Azure OpenAI Service -- Enterprise wrapper with Azure compliance and private networking.
AWS Bedrock -- Multi-model access through AWS infrastructure; pre-negotiated cloud discounts.
together.ai -- Third-party host for open-source models with different cost/performance trade-offs.
Fireworks AI -- Model serving platform; helped Notion achieve 350ms latency.
Replicate -- Open-source model hosting with pay-per-use pricing.

OpenRouter -- Single API to 100+ models; 5% markup; :nitro for speed, :floor for lowest price.
LiteLLM -- Open-source self-hosted gateway; 3ms P50 / 17ms P90 latency overhead.
Portkey -- Enterprise LLM gateway starting at $49/month; 250+ LLMs; semantic caching and audit trails.
RouteLLM -- Research showing 85% cost reduction while maintaining 90-95% of GPT-4 quality.
Helicone -- LLM observability platform; built-in caching can cut costs 20-30%.
Langfuse -- Open-source LLM observability and evaluation platform.

LoRA (Low-Rank Adaptation) -- Reduces GPU memory requirements by up to 3x for fine-tuning.
PromptLayer -- Prompt versioning and management as code.
Maxim -- Prompt versioning and AI evaluation platform.
Vespa -- Distributed search and ranking engine; powers Perplexity's 200 billion URL index.

LMSYS Chatbot Arena -- Community-driven model evaluation; Elo differences under 50 points are "basically a toss-up."
Lost in the Middle (arXiv) -- Research on performance degradation when relevant information sits in the middle of long contexts.
Cisco: Security Evaluation of DeepSeek -- 100% attack success rate on DeepSeek-R1; fails to block any harmful prompts.
Qualys DeepSeek Security Assessment -- DeepSeek generates insecure code at 4x the rate of competitors.
Bain & Company: DeepSeek Analysis -- DeepSeek V3 trained for ~$6M versus estimated $100M+ for GPT-4.
IDC: The Future of AI is Model Routing -- By 2028, 70% of top AI-driven enterprises will use multi-model architectures.
CFM Case Study (HuggingFace) -- Capital Fund Management achieved solutions 80x cheaper than large LLMs with LoRA fine-tuning.
Glean AI Evaluator -- How Glean measures context relevance and recall rates for enterprise search evaluation.

LMSYS Chatbot Arena Leaderboard -- Community-driven, open platform for evaluating LLMs through human preference.
Hugging Face -- Open-source AI community; hosts models, datasets, and case studies like CFM's LoRA fine-tuning.
Data-Centric AI (datacentricai.org) -- Community and resources focused on improving AI through data quality rather than model architecture.

Provider	Strength	Key SLA	Cache Discount
OpenAI	Ecosystem depth, brand	~99.3% uptime	50% automatic
Anthropic	Safety, long context	Rate limit tiers	90% on hits (25% write premium)
Google Vertex	Enterprise SLA	99.9% with financial credits	Varies
DeepSeek	Cost ($0.07/M tokens)	No enterprise SLA	N/A
Meta (Llama)	Open weights, self-host	Self-managed	N/A