Chapter 4: Infrastructure for AI-First Operations -- Resources¶
Curated resources for deeper exploration of topics covered in this chapter.
Frameworks from This Chapter¶
- 5 Infrastructure Mistakes That Kill AI Initiatives -- Over-engineering early, single points of failure, no observability, ignoring cost signals, and security as an afterthought.
Tools & Platforms¶
Day 1 Stack (Under $500/month)¶
- Vercel -- Serverless deployment platform; auto-injects Supabase credentials and unifies billing.
- Supabase -- Managed PostgreSQL with built-in auth, pgvector support, and real-time capabilities; 1.7 million developers.
- Flask -- Python micro web framework; Yirifi's backend choice for all 15 microsites.
- HTMX -- HTML-first frontend approach; no React or complex frontend frameworks required.
Databases & Storage¶
- PostgreSQL -- Primary relational database; with pgvector achieves 471 QPS at 99% recall on 50M vectors.
- pgvector -- PostgreSQL extension for vector similarity search; 11.4x better than dedicated vector databases on benchmarks.
- pgvectorscale -- Enhanced pgvector performance from Timescale.
- Redis -- In-memory caching and session management; add when same data is read 10x+ per write.
- Pinecone -- Managed vector database; cost-effective at $100-200/month below 80M queries/month threshold.
- Qdrant -- Open-source vector database for self-hosting at scale.
- Milvus -- Open-source vector database designed for billion-scale similarity search.
- Weaviate -- Open-source vector database with built-in ML model integrations.
- Neo4j -- Graph database for relationship-heavy workloads (knowledge graphs, recommendation systems).
- MongoDB -- Document store for flexible schema requirements beyond PostgreSQL JSONB.
- SQLite -- Lightweight database; used by Yirifi for ontology knowledge graph.
Security & Auth¶
- Lasso Security MCP Gateway -- First open-source MCP security gateway; proxy and orchestrator embedding security filters across MCP servers.
- Auth0 for AI Agents -- Agent-specific authentication flows from Okta/Auth0.
- Microsoft Entra Agent ID -- Dedicated identity types for AI agents; same conditional access as human users.
- Model Context Protocol (MCP) -- De facto standard for agent-tool communication; adopted by Anthropic, OpenAI, Google, and Microsoft.
Observability & Cost Tracking¶
- Helicone -- LLM observability with built-in caching (20-30% cost reduction); 50-80ms latency trade-off.
- Langfuse -- Open-source LLM observability platform.
- LangSmith -- LLM monitoring and evaluation from LangChain.
- CloudZero -- AI cost tracking; research found only 51% of organizations can evaluate AI ROI.
Further Reading¶
- How Instacart Built Modern Search Infrastructure on Postgres -- Pushed pgvector to 1 billion embeddings; 6% drop in zero-result searches after migration from FAISS.
- Linear's Custom Sync Engine -- Why Linear built custom sync (competitive advantage) but buys managed PostgreSQL (commodity).
- Advanced Authentication and Authorization for MCP Gateway (Red Hat) -- The
x-authorized-toolsJWT wristband pattern for gateway enforcement. - MCP First Anniversary -- History and evolution of the Model Context Protocol standard.
- Claude Code Documentation -- Claude Code as both MCP client and server; headless mode for CI/CD automation.
- The Complete Guide to LLM Observability Platforms -- LLM observability market projected from $1.4B (2023) to $10.7B (2033).
- Notion: Building a Scalable AI Feature Evaluation System -- Notion's hundreds of evaluation datasets with LLM-as-judge scoring.
Research & Data¶
- MIT Study: 95% of GenAI Pilots Fail -- $30-40 billion invested in 2024 pilots; infrastructure decisions killed good ideas before shipping.
- Adversa AI: 2025 AI Security Incidents Report -- 73% of enterprises experienced AI-related security breaches; $4.8M average incident cost.
- Cloud Security Alliance: Agentic AI Identity and Access Management -- Non-human identities outnumber humans 50:1 in enterprise environments.
- Gartner Prediction: 25% of Breaches from AI Agent Abuse by 2028 -- Via Strata Identity analysis.
- CloudZero State of AI Costs 2025 -- Average monthly AI spend jumped from $62,964 to $85,521 (36% YoY); 45% of companies spending $100K+/month.
- pgvector vs Qdrant Benchmarks (Tigerdata) -- pgvectorscale achieving 471 QPS at 99% recall on 50M vectors.
- OAuth Token Exchange RFC 8693 -- Delegation chain specification for agent-to-agent permission passing.
- DPoP (Demonstration of Proof-of-Possession) -- Cryptographic proof preventing stolen token reuse.
- Global Market Insights: Vector Database Market -- Vector database market reached $2.2B in 2024.
Community & Learning¶
- Model Context Protocol Specification -- Official MCP spec including OAuth 2.1 authorization.
- GitHub Actions: Claude Code Action -- Mention
@claudein PRs/issues to trigger AI analysis with gateway controls. - Supabase Auth: Build vs Buy -- Analysis of authentication build vs buy economics.
Infrastructure Decision Thresholds¶
| Component | Buy Threshold | Build/Self-Host Threshold |
|---|---|---|
| Vector Database | < 80M queries/month | > 80-100M queries/month |
| AI Gateway | < $10K/month LLM spend | > $10K/month LLM spend |
| Authentication | Always buy (security risk) | Only delegation logic custom |
| Observability | < 50K events/month | > 50K events/month with DevOps capacity |
| General AI Infra | Pre-product-market fit | Scale stage (18+ months) |