Background Agent¶

Autonomous task processor with checkpointing, monitoring, and resource budgets.

Book reference: Chapter 6 - Agent Architecture, Sections 1 and 6

What This Demonstrates¶

Background agents execute well-defined tasks without human supervision. No one is watching. No one is waiting. They are triggered by schedules, events, or API calls -- not human messages.

This example implements the six background agent design patterns from the book:

Idempotency - Safe to run the same task twice (idempotency keys prevent duplicates)
Checkpointing - Saves progress after each task; resumes from interruption
Alerting - Dead man's switch, failure escalation, budget alarms
Audit logging - Every task start, completion, and failure is logged
Graceful degradation - Failed tasks are quarantined; processing continues
Resource budgeting - Token and cost limits prevent runaway execution

Files¶

File	Purpose
`agent.py`	Main processing loop with retry logic and budget enforcement
`tasks.py`	Task definitions, queue management, and checkpointing
`monitor.py`	Monitoring, alerting, and the dead man's switch pattern
`config.py`	Configuration via environment variables

Uses the shared provider library at examples/shared/ for LLM access via OpenRouter.

Quick Start¶

# Install dependencies
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env and add your OpenRouter API key (https://openrouter.ai/keys)

# Run
python agent.py

How It Works¶

[Load checkpoint if exists]
    |
    v
[For each pending task:]
    |
    +-- Check budget --> [STOP if exhausted]
    |
    +-- Record heartbeat
    |
    +-- Process task via LLM (OpenRouter)
    |       |
    |       +-- Success --> [Mark completed, log result]
    |       |
    |       +-- Failure --> [Retry with backoff]
    |                |
    |                +-- Max retries --> [Quarantine task, alert]
    |
    +-- Save checkpoint
    |
    v
[Report summary]

The agent processes a queue of tasks sequentially. After each task (success or failure), it saves a checkpoint to disk. If the process crashes, restarting picks up from the last checkpoint. Tasks that exceed the retry limit are quarantined rather than blocking the entire queue.

Key Design Decisions¶

Provider abstraction via shared/ library. Swap providers by changing LLM_PROVIDER in .env.
File-based checkpointing for simplicity. Production systems should use Temporal, Redis, or a database.
Idempotency keys on each task prevent duplicate processing across restarts.
Token and cost budgets stop the agent before costs spiral. From the book: "This isn't optional -- it's the difference between a manageable mistake and a resignation letter."
Dead man's switch monitors heartbeats. If the agent goes silent, the monitor fires a critical alert.
Temperature 0.0 for deterministic outputs on batch tasks.
Async architecture using asyncio for non-blocking LLM calls.

Configuration¶

Variable	Default	Description
`OPENROUTER_API_KEY`	(required)	OpenRouter API key
`LLM_PROVIDER`	`openrouter`	Provider name (`openrouter` or `openai`)
`MODEL`	`google/gemini-2.5-flash`	Model to use
`BG_AGENT_MAX_RETRIES`	`3`	Max retries per task
`BG_AGENT_TIMEOUT`	`300`	Task timeout in seconds
`BG_AGENT_POLL_INTERVAL`	`10`	Polling interval in seconds
`BG_AGENT_TOKEN_BUDGET`	`50000`	Max tokens per run
`BG_AGENT_COST_BUDGET`	`1.00`	Max cost (USD) per run

Sample Tasks¶

The demo includes five sample tasks across three types:

summarize - Condense text into one sentence
analyze_sentiment - Classify sentiment with confidence score
extract_entities - Pull named entities from text

Extending This Example¶

Replace file-based queue with Redis, SQS, or Temporal workflows
Add a scheduler (cron, APScheduler) for periodic execution
Connect the monitor to Slack, PagerDuty, or a dashboard
Add parallel task processing with asyncio.gather()
Implement exponential backoff between retries