What Background Jobs Taught Me About Reliable AI Agents

Moving 18 scheduled AI jobs from cloud to local looks like a cost exercise. It is really a reliability audit: which jobs can run unattended, and which ones need tighter boundaries.

What Background Jobs Taught Me About Reliable AI Agents

Moving scheduled AI jobs from cloud inference to local models starts as a cost question and ends as a reliability question. The process forces you to examine each job: what does it actually do, how much does it need to know, and what happens when it fails silently? This is a field note from migrating 18 background jobs in one night — and what the process revealed about running agentic AI reliably.

At some point you look at your cron list and realize you’ve been paying Claude Sonnet prices for tasks that don’t need Claude Sonnet. That’s where I was on Friday night: 22 cron jobs running across my AI agent setup, a mix of models, some aliases that weren’t resolving, and a growing suspicion that I’d been sloppy about this for a while. Time to fix it properly. — ## What Was Running A quick inventory of what my automated jobs actually do: – System healthchecks every few hours – Security alert monitors (three separate jobs) – Morning briefs and daily summaries – Weekly business reviews – Project status update drafts – Weekly training content generation – Recurring personal messages to group chats – Weekly project tracking and review – Regression testing That’s 22 jobs. Some genuinely need a capable frontier model — you don’t want your security monitors running on a flaky local setup, and long-form writing tasks benefit from Sonnet-level output. But healthchecks? Project trackers? Things that mostly read files and return structured summaries? Those don’t need the expensive model. The goal: move non-sensitive utility crons to local Devstral, keep security monitors on Claude Haiku (cheapest reliable cloud option), leave heavy writing jobs on Sonnet. — ## Installing Devstral “`bash ollama pull devstral-small-2 “` 15GB download. Worth getting coffee. Quick smoke test before touching any config: “`bash ollama run devstral-small-2 “Hello, what model are you?” “` Confirmed: Mistral AI identity, coherent response. Good start. Register the model with the agent: “`bash openclaw config set models.providers.ollama.apiKey “ollama-local” openclaw config set agents.defaults.models.ollama/devstral-small-2.alias “devstral” openclaw config set agents.defaults.heartbeat.model “ollama/devstral-small-2” “` **Critical: set reasoning to false for Devstral.** “`bash openclaw config set models.options.ollama/devstral-small-2.reasoning false “` Without this, OpenClaw sends the system prompt as a `”developer”` role message. Ollama ignores that role. Your agent runs without any of its instructions and you’ll spend an hour wondering why it’s behaving strangely. Validate your config after every change: “`bash python3 -c “import json; json.load(open(‘openclaw.json’))” “` No output = clean. Error = fix before restarting. — ## The Migration Here’s a thing I learned: `openclaw cron set` doesn’t exist. The command you want is: “`bash openclaw cron edit –model “` I needed to edit 18 crons. You can do this as a batch — pull all your cron IDs, build a loop, run through them. Kept the three security monitor crons explicitly on `claude-haiku` rather than letting them drift. Security is not where you cut corners on model quality. One cron threw an error during the migration: a one-shot reminder job with no `message` field. That’s a different payload type (`systemEvent` rather than `agentTurn`) — it doesn’t have a model field to set. Harmless, ignorable. — ## The Final Stack | Role | Model | Origin | API Cost | |—|—|—|—| | Main chat | Claude Sonnet | USA (Anthropic) | Per token | | 18 background crons | Devstral Small 2 | France (Mistral) | Free (local) | | Security monitors (3 crons) | Claude Haiku | USA (Anthropic) | ~$0.25/M input | | Code tasks | Codex | USA (OpenAI) | Per token | | Deep work / long-form | Claude Opus | USA (Anthropic) | Free (Max plan) | — ## What to Watch After Deployment **The first live local model run fires quickly.** If you’re running a healthcheck cron on a short interval, you’ll get a real test within minutes of deploying. Watch it. **Monitor with:** “`bash openclaw logs –follow “` If you see `payload.model not allowed, falling back to agent defaults` — your alias isn’t resolving. Double-check the alias config and that Ollama is actually running. **On the `qmd collection add skipped` warnings:** if you’ve set up QMD for memory, you’ll see these on restart. Normal. QMD is telling you the collections already exist and it’s skipping re-creation. Not an error. **Tool call failures:** if tool calls are failing silently, verify you’re using Ollama’s native API endpoint, not the OpenAI-compatible `/v1` endpoint. They behave differently. The native API with `stream:false` is more reliable for agentic tool use. — ## Cost Impact 18 crons running on cloud models, multiple times per day, with large system prompts — the costs compound fast. Moving them to local Devstral cuts that API spend to zero for those jobs. The electricity cost of running Devstral on an is rounding error. The three security monitors staying on Claude Haiku aren’t free, but Haiku is the cheapest reliable Claude option. And “reliable” matters for security — you don’t want a flaky local model missing an alert because it hallucinated the output format. The math is straightforward: use local models where reliability is acceptable and the task is routine. Use cloud models where reliability is non-negotiable or the task genuinely needs frontier-level capability. — ## Lessons Learned **Use the CLI, not the JSON editor.** I started by hand-editing `openclaw.json`. I ended the session having learned to use `openclaw config set` and `openclaw config unset` for everything. The CLI validates as it writes. JSON has no mercy for trailing commas. **Know your Node versions.** If you’re running a LaunchAgent-based gateway, it may be using a different Node version than your interactive shell. Native bindings (SQLite, LanceDB, anything compiled) need to match the version that’s actually running the gateway — not the one you’re typing commands into. **The dual-account setup creates friction.** If you’re running OpenClaw under a service account but installing tools under an admin account (for Homebrew, npm globals), plan for the permission boundaries. They will bite you if you don’t. **Back up before every change.** `cp openclaw.json openclaw.json.bak` is three seconds. Config parse errors on restart are not. **Friday nights are apparently when infrastructure gets rebuilt.** I’m not sure what that says about me. — *Hardware: local Mac Mini | OpenClaw | Node.js | Ollama at a local inference endpoint | Devstral Small 2 (24B, ~15GB) | Bun*

The larger lesson: The real value of moving AI jobs to local inference is not the cost reduction — it is the audit it forces. You look at each job and ask: what context does this need, how much can go wrong if it fails silently, and does it need frontier capability or just reliable instruction-following? That audit is worth doing regardless of where you end up running the jobs.


Related notes: