Chapter 4: Infrastructure for AI-First Operations -- Resources¶

Curated resources for deeper exploration of topics covered in this chapter.

Frameworks from This Chapter¶

5 Infrastructure Mistakes That Kill AI Initiatives -- Over-engineering early, single points of failure, no observability, ignoring cost signals, and security as an afterthought.

Vercel -- Serverless deployment platform; auto-injects Supabase credentials and unifies billing.
Supabase -- Managed PostgreSQL with built-in auth, pgvector support, and real-time capabilities; 1.7 million developers.
Flask -- Python micro web framework; Yirifi's backend choice for all 15 microsites.
HTMX -- HTML-first frontend approach; no React or complex frontend frameworks required.

PostgreSQL -- Primary relational database; with pgvector achieves 471 QPS at 99% recall on 50M vectors.
pgvector -- PostgreSQL extension for vector similarity search; 11.4x better than dedicated vector databases on benchmarks.
pgvectorscale -- Enhanced pgvector performance from Timescale.
Redis -- In-memory caching and session management; add when same data is read 10x+ per write.
Pinecone -- Managed vector database; cost-effective at $100-200/month below 80M queries/month threshold.
Qdrant -- Open-source vector database for self-hosting at scale.
Milvus -- Open-source vector database designed for billion-scale similarity search.
Weaviate -- Open-source vector database with built-in ML model integrations.
Neo4j -- Graph database for relationship-heavy workloads (knowledge graphs, recommendation systems).
MongoDB -- Document store for flexible schema requirements beyond PostgreSQL JSONB.
SQLite -- Lightweight database; used by Yirifi for ontology knowledge graph.

Lasso Security MCP Gateway -- First open-source MCP security gateway; proxy and orchestrator embedding security filters across MCP servers.
Auth0 for AI Agents -- Agent-specific authentication flows from Okta/Auth0.
Microsoft Entra Agent ID -- Dedicated identity types for AI agents; same conditional access as human users.
Model Context Protocol (MCP) -- De facto standard for agent-tool communication; adopted by Anthropic, OpenAI, Google, and Microsoft.

Helicone -- LLM observability with built-in caching (20-30% cost reduction); 50-80ms latency trade-off.
Langfuse -- Open-source LLM observability platform.
LangSmith -- LLM monitoring and evaluation from LangChain.
CloudZero -- AI cost tracking; research found only 51% of organizations can evaluate AI ROI.

MIT Study: 95% of GenAI Pilots Fail -- $30-40 billion invested in 2024 pilots; infrastructure decisions killed good ideas before shipping.
Adversa AI: 2025 AI Security Incidents Report -- 73% of enterprises experienced AI-related security breaches; $4.8M average incident cost.
Cloud Security Alliance: Agentic AI Identity and Access Management -- Non-human identities outnumber humans 50:1 in enterprise environments.
Gartner Prediction: 25% of Breaches from AI Agent Abuse by 2028 -- Via Strata Identity analysis.
CloudZero State of AI Costs 2025 -- Average monthly AI spend jumped from $62,964 to $85,521 (36% YoY); 45% of companies spending $100K+/month.
pgvector vs Qdrant Benchmarks (Tigerdata) -- pgvectorscale achieving 471 QPS at 99% recall on 50M vectors.
OAuth Token Exchange RFC 8693 -- Delegation chain specification for agent-to-agent permission passing.
DPoP (Demonstration of Proof-of-Possession) -- Cryptographic proof preventing stolen token reuse.
Global Market Insights: Vector Database Market -- Vector database market reached $2.2B in 2024.

Model Context Protocol Specification -- Official MCP spec including OAuth 2.1 authorization.
GitHub Actions: Claude Code Action -- Mention @claude in PRs/issues to trigger AI analysis with gateway controls.
Supabase Auth: Build vs Buy -- Analysis of authentication build vs buy economics.

Component	Buy Threshold	Build/Self-Host Threshold
Vector Database	< 80M queries/month	> 80-100M queries/month
AI Gateway	< $10K/month LLM spend	> $10K/month LLM spend
Authentication	Always buy (security risk)	Only delegation logic custom
Observability	< 50K events/month	> 50K events/month with DevOps capacity
General AI Infra	Pre-product-market fit	Scale stage (18+ months)