Generated 2026-06-20 by the plan-openclaw-upgrade dynamic workflow (5 agents: target-pin, internal runbook survey, risk-surface survey, changelog-delta research, synthesis). Current installed: 2026.5.22. Latest stable found: 2026.6.8 (2026-06-16). Recommendation: HOLD on 5.22. See §1.
Latest stable: 2026.6.8 (2026-06-16). Versions released after 5.22: 2026.5.26, 5.27, 5.28, 6.1, 6.5, 6.6, 6.8.
Headline-relevant changes for claw001:
YYYY.M.PATCH (3rd component is now a monthly patch counter, not calendar day; June floor pinned at 6.5). Auth profiles moved to SQLite. MCP tool-result block coercion (prevents Anthropic 400s). Anthropic extended-thinking recovery after prompt-cache expiry/restart.google-generative-ai.Sources: local ~/openclaw git clone (tags + CHANGELOG.md), GitHub releases, docs.openclaw.ai.
NO-GO on 2026.6.8. Conditional, caveated GO on 2026.6.6 only if a non-image driver forces a move.
Single most important reason: The image-gen routing nondeterminism (#90074, OPEN, "needs maintainer review") that pinned us to 5.22 is NOT fixed anywhere in 5.22→6.8. Upgrading buys zero on the original pin reason while adding three SQLite migrations (cron, sessions, auth-profiles) and a CLI exit-code semantics change that touch our fail-closed host scripts. Separately, 2026.6.8 is community-flagged "wait for next release" with a CRITICAL gateway-startup regression (#94570 ERR_MODULE_NOT_FOUND) that can brick the container, plus #90361 (memory_search reindex race → hits Coach) and #94033 (isolated-cron timeout → hits our compose-and-deliver crons).
Recommendation: stay on 5.22. The prior #88312 (multi-tool Codex turn death) gate must also be confirmed reverted in the target changelog before ANY move — codex is the farm's primary chat path. If you must move (and only after confirming #88312 fixed), target 2026.6.6, never 6.8, and treat everything below as the 6.6 procedure (substitute v2026.6.6 for <TARGET>). The plan below is written for the GO case so it's executable when the gate clears.
Hard pre-req for any GO: re-auth codex OAuth in the window opener (token expires ~2026-06-29, ~9 days out; single-use refresh can be spent during chaotic warmup).
<TARGET> = v2026.6.6 (or the first post-#88312-fix tag). <CURRENT> = v2026.5.22.
P0 — Gate checks (abort if any fail):
# Confirm #88312 (multi-tool Codex turn death) reverted/fixed in TARGET changelog:
cd ~/openclaw && git fetch --tags && git log --oneline v2026.5.22..v<TARGET> | grep -iE '88312|codex.*turn|multi.?tool'
# Confirm TARGET is NOT a flagged release (re-check clawstat.us before window).
git tag | grep v<TARGET> # tag must exist
P1 — Codex OAuth re-auth (THE SPOF — do this first, before snapshot):
docker exec -it openclaw-openclaw-gateway-1 openclaw models auth login --provider openai-codex
# Device code, ~2min, Daniel's ChatGPT-Pro account. Gets a fresh long-lived token,
# dodges the 06-29 cliff AND the single-use-refresh race during warmup.
docker exec openclaw-openclaw-gateway-1 openclaw config get agents.defaults.model # sanity
P2 — Retired-model audit (doctor no-op insurance):
grep -E '"model"|"primary"|"fallbacks"' ~/.openclaw/openclaw.json
# Cross-ref against 6.6 changelog retired list; replace any retired ref via
# openclaw config set BEFORE building. Note 6.1 resolves Google → google-generative-ai;
# audit any Google/OpenRouter/Vertex provider IDs for canonical form.
P3 — Confirm patches still apply against TARGET (dry-run, no commit):
cd ~/openclaw && git stash -u 2>/dev/null; git checkout v<TARGET> -- Dockerfile docker-compose.yml
for p in ~/openclaw-ops/host-overrides/*.patch; do
patch -p1 --dry-run --forward < "$p" && echo "OK: $p" || echo "FAIL — regenerate: $p"
done
git checkout v2026.5.22 -- Dockerfile docker-compose.yml # restore working tree
If any FAIL: regenerate via git diff <file> > host-overrides/<name>.patch against the new tag. cloud-build.sh will bail loudly post-checkout otherwise, leaving a partial tree.
P4 — Headroom: df -h / → confirmed 26G free (want >20G). Good. Confirm .env pin is an explicit tag (:v2026.5.22, confirmed — not :latest).
# Prune Docker space first (disk insurance):
docker builder prune -af && docker image prune -f && df -h /
# Snapshot (writes state.tar.gz = full restorable + frozen agents/models/cron/plugins/devices txt):
~/openclaw-ops/scripts/snapshot-openclaw.sh "pre-v<TARGET>"
# Pre-upgrade extras:
mkdir -p ~/openclaw-ops/backups/pre-upgrade-extras
# (a) all 4 agents' auth-profiles.json — the codex-OAuth SPOF store (rewritten by 5.28/6.1 migration):
tar czf ~/openclaw-ops/backups/pre-upgrade-extras/auth-profiles-pre-v<TARGET>.tgz \
~/.openclaw/agents/*/agent/auth-profiles.json
# (b) baseline extension dir on CURRENT image for KEEP-list diff:
docker exec openclaw-openclaw-gateway-1 ls /app/dist/extensions/ \
> ~/openclaw-ops/backups/pre-upgrade-extras/extensions-v2026.5.22.txt
# (c) the SQLite cron/session DBs (NEW — migrations are one-way; capture pre-migration state):
tar czf ~/openclaw-ops/backups/pre-upgrade-extras/runtime-state-pre-v<TARGET>.tgz \
~/.openclaw/jobs.json ~/.openclaw/agents/*/sessions/sessions.json 2>/dev/null
# (d) crontab explicit copy (clean restore one-liner):
crontab -l > ~/openclaw-ops/backups/crontab.bak.upgrade-v<TARGET>
The tarball is the reliable rollback artifact; CLI text captures are best-effort under load.
~/openclaw-ops/scripts/cloud-build.sh v<TARGET>
This auto: git fetch → reset Dockerfile/compose mods → git checkout v<TARGET> → re-apply host-overrides/cloudbuild.yaml + all 3 patches (patch -p1 --forward, bails exit 1 if any fails) → gcloud builds submit.
Watch for the 3 known Cloud Build quirks (all pre-fixed, confirm they hold):
# syntax=docker/dockerfile:1.6 directive applied → else build fails "/${OPENCLAW_BUNDLED_PLUGIN_DIR}" not found (BuildKit can't expand ${VAR} in --mount=source=). Do NOT "fix" via --build-arg or by dropping --cache-from.logging: GCS_ONLY + logsBucket in cloudbuild.yaml → claw-backup-writer SA lacks Cloud Logging read.--cache-from → cache-key breaks on the ARG-mount syntax.Monitor (sequential, never stack docker exec):
gcloud builds list --region=northamerica-northeast2 --limit=3
# On failure, read logs from GCS (SA can't read Cloud Logging):
gcloud storage cat gs://clawsorg-claw001-backups/cloudbuild-logs/log-<build_id>.txt
Wall time ~9-12 min. Gate before deploy:
gcloud artifacts docker tags list \
northamerica-northeast2-docker.pkg.dev/clawsorg/openclaw/gateway
# Expect: SUCCESS, :v<TARGET> + :latest on new digest, AND :v2026.5.22 STILL TAGGED (rollback target).
NEW_IMAGE=northamerica-northeast2-docker.pkg.dev/clawsorg/openclaw/gateway:v<TARGET>
# (11) Swap .env:
sed -i.bak.pre-upgrade 's|gateway:v2026.5.22|gateway:v<TARGET>|' ~/openclaw/.env
grep OPENCLAW_IMAGE ~/openclaw/.env # verify
# (12) Pause host crontab (maintenance window — backup already captured):
crontab -r
# (13) Pull:
cd ~/openclaw && docker compose pull openclaw-gateway
# (14) KEEP-list / extension-dir audit BEFORE recreating production:
docker run --rm --entrypoint='' "$NEW_IMAGE" ls /app/dist/extensions/ \
> ~/openclaw-ops/backups/pre-upgrade-extras/extensions-v<TARGET>.txt
diff ~/openclaw-ops/backups/pre-upgrade-extras/extensions-v2026.5.22.txt \
~/openclaw-ops/backups/pre-upgrade-extras/extensions-v<TARGET>.txt
# NOTE: 6.1 externalized Tokenjuice + Copilot to npm plugins; 6.6 externalized Llama.cpp.
# If a dir you depend on (KEEP="openai exa browser openrouter telegram memory-core
# image-generation-core") vanished or a new depended-on dir appeared → STOP,
# edit ~/.openclaw/gateway-start.sh KEEP= first (bind-mounted, no rebuild needed).
# (15) Recreate — up -d NOT restart (restart won't re-read .env):
docker compose up -d openclaw-gateway
Doctor + version verify (STOP gate):
# Wait healthy:
until docker ps --filter name=gateway-1 --format '{{.Status}}' | grep -q healthy; do sleep 5; done
docker exec openclaw-openclaw-gateway-1 openclaw --version # expect TARGET
# CRITICAL timing: do NOT run doctor concurrent with cold warmup (starves event loop ~10min
# on e2-medium → unhealthy flapping). Wait for the 'provider auth state pre-warmed' log line:
docker logs --tail 200 openclaw-openclaw-gateway-1 | grep -i 'pre-warmed'
docker exec openclaw-openclaw-gateway-1 openclaw doctor --fix 2>&1 \
| tee ~/openclaw-ops/backups/pre-upgrade-extras/doctor-fix-v<TARGET>.log
grep -iE 'error|fail|migrat' ~/openclaw-ops/backups/pre-upgrade-extras/doctor-fix-v<TARGET>.log
# Doctor performs the SQLite migrations (cron jobs.json→SQLite, session metadata, agent
# registry, auth-profile canonical rewrite). Any 'error'/'migration failed' → halt + read,
# do NOT force-restart. Confirm ~/.openclaw/openclaw.json.bak written (restore = cp bak json + restart).
# If wedged: docker exec openclaw-openclaw-gateway-1 pkill -9 -f openclaw-doctor
# After doctor's config writes, clean restart on migrated config:
docker compose restart openclaw-gateway # ~130s
Canary window: 60-min active watch, then 7-day passive observation.
Smoke tests (sequential, sleep 2 between — never parallel; e2-medium thrashes to 400% otherwise):
# (A) Codex auth FIRST (the SPOF — verify chat didn't fall to OpenRouter-Haiku):
docker exec openclaw-openclaw-gateway-1 openclaw config get agents.defaults.model
cat ~/.openclaw/agents/main/agent/auth-profiles.json | grep -c openai-codex # token still present
# (B) Channels probe (30s timeout — 10s default too tight on e2-medium):
docker exec openclaw-openclaw-gateway-1 openclaw channels status --probe --timeout 30000
# Expect all 4 Telegram bots: connected, mode:polling, works. 1008 pairing required → see §6 wipe.
# (C) Heartbeat cron canary (most reliable liveness):
docker exec openclaw-openclaw-gateway-1 openclaw cron run 4febe374-a480-46e6-9ea7-bc87be107e57
sleep 30
docker exec openclaw-openclaw-gateway-1 openclaw cron runs --id 4febe374-a480-46e6-9ea7-bc87be107e57 --limit 1
# Expect duration <60s, ok:true, fresh ~/.openclaw/workspace-kit/HEARTBEAT.md
# (D) Per-cron OUTPUT canary (heartbeat does NOT catch degraded content):
~/openclaw-ops/scripts/main-farm-health.sh # G1 freshness + content-degradation sentinel
# Then eyeball ONE real composed artifact — trigger coach-morning-brief, wait ~60s, confirm
# the delivered summary has no 'unavailable'/'couldn't'/'did not run'/'jq…failed'/⚠️/🛠️.
# If a cron's tools collapsed: check payload.toolsAllow (stale names) + lightContext;
# fix = openclaw-cron-edit.sh ... --clear-tools --no-light-context (no restart).
# NOTE 6.6: cron list shape is now SQLite-backed — confirm cron list --json parses unchanged
# and openclaw-cron-edit.sh still writes correctly (re-validate the #31425 no-op behavior).
# (E) image_generate (the pinned-failure path — DO NOT skip, fails silently):
# Test via the REAL Telegram agent path, NOT CLI infer (different path, masks failures):
# Telegram → Coach: "send the quad reset cards" (reusable-visuals skill)
# Expect: image lands. If image_generate hard-fails → codex auth degraded OR routing
# flapped to codex-responses bridge (#90074, UNFIXED — known, tolerated, message-first).
# DANGER: do NOT apply apiKey SecretRef {source:env} to force routing — crash-loops gateway.
# (F) Diff against snapshot (SEQUENTIAL):
SNAP=$(ls -1dt ~/openclaw-ops/backups/pre-upgrade/*pre-v<TARGET>/ | head -1)
docker exec openclaw-openclaw-gateway-1 openclaw agents list # diff vs $SNAP/agents.txt
sleep 2; docker exec openclaw-openclaw-gateway-1 openclaw cron list # vs cron.txt
sleep 2; docker exec openclaw-openclaw-gateway-1 openclaw config get plugins # vs plugins.txt
# Expected: version bumps, ID churn, time shifts, new bundled plugins.
# UNEXPECTED (investigate): missing agents, lost bindings, lost crons, plugin loaded→disabled.
| Risk | Check | Pass criteria |
|---|---|---|
| Codex OAuth SPOF | auth-profiles.json has token; chat model not Haiku | agents.defaults.model = codex; token present in main's store |
| Image-gen routing (#90074, UNFIXED) | Telegram→Coach "send quad reset cards" | Image lands. Flapping tolerated; never block. NO SecretRef apiKey |
| Cron capability semantics (UNCHANGED in range) | audit_cron_prompts.py --contracts clean | No G4 scan_capability violations; read/write/exec names intact |
| Cron SQLite migration (NEW 6.1/6.5) | cron list --json parses; openclaw-cron-edit.sh writes | Shape unchanged; all 18 crons present, status:ok; #31425 no-op re-verified |
| Session SQLite migration (NEW 6.5) | session-reset-monitor + persona paths resolve | sessions.json path still readable OR tooling updated; persona-delete trick re-tested |
| CLI exit-code change (6.8 only — N/A on 6.6) | host scripts' $? handling | If on 6.8: re-validate main-farm-health/main-telegram-watch fail-closed logic |
| Persona bootstrap (12000-char cap) | session-start log | No unexpected "Bootstrap truncation warning"; sessions re-injected cleanly |
| Spool wedge (structural, guarded) | spool-oom-sweep.sh in crontab post-restore | Cron line present; no stranded .processing post-recreate |
| Dreaming event-loop block (guarded) | gateway-cpu-watchdog.sh in crontab | Line present; CPU not pinned post-warmup |
| coach-checkin SPOF | plugin enabled + linked | ~/.openclaw/plugins-src/coach-checkin/ intact; enabled (6.6 #93886 plugin-load boundary watch) |
| bonjour guard | plugins.deny contains bonjour | Persists (load-bearing anti-crash-loop) |
| Channels | 4 Telegram bots | connected, polling, works |
Trigger: any STOP gate fails or broken & not fixable in <15 min.
SNAP=$(ls -1dt ~/openclaw-ops/backups/pre-upgrade/*pre-v<TARGET>/ | head -1)
sed -i 's|gateway:v<TARGET>|gateway:v2026.5.22|' ~/openclaw/.env
cd ~/openclaw && docker compose down
tar -xzf "$SNAP/state.tar.gz" -C ~ # restores ~/.openclaw/, ~/openclaw-ops/, .env, override
# CRITICAL for this upgrade: the SQLite migrations are one-way. state.tar.gz restores the
# PRE-migration ~/.openclaw (jobs.json + sessions.json + auth-profiles.json all pre-rewrite),
# which is exactly what v5.22 expects. Do NOT keep the migrated SQLite DBs.
docker compose pull openclaw-gateway # re-pull old image (works ∵ :v2026.5.22 stayed tagged, §4 gate)
docker compose up -d openclaw-gateway
until docker ps --filter name=gateway-1 --format '{{.Status}}' | grep -q healthy; do sleep 5; done
docker exec openclaw-openclaw-gateway-1 openclaw --version # expect 2026.5.22
crontab ~/openclaw-ops/backups/crontab.bak.upgrade-v<TARGET> # restore paused host crons
# If codex token was spent during the window: re-auth (device code).
docker exec -it openclaw-openclaw-gateway-1 openclaw models auth login --provider openai-codex
Rollback hinges on: (1) :v2026.5.22 AR tag preserved (§4 gate), (2) state.tar.gz source-of-truth (NOT the migrated SQLite), (3) git-head.txt for source revert. File the failure in BACKLOG.md.
These are the deltas the v5.22 plan template does NOT cover — all driven by openclaw doctor post-upgrade, captured pre-upgrade in §3(c):
cron list --json shape, re-validate openclaw-cron-edit.sh + audit_cron_prompts.py + snapshot-openclaw.sh frozen-diff against SQLite; add the new cron SQLite DB to backup-claws.sh tarball.sessions/sessions.json paths still resolve for session-reset-monitor / persona-rotation, or update tooling; re-test the persona-delete-from-sessions immediate-refresh trick.google-generative-ai (6.1) + OpenRouter/Vertex prefix normalization (6.8) — audit openclaw.json provider IDs (§P2).$? handling.Cron capability semantics (lightContext/toolsAllow→promptMode:minimal, read/write/exec names) are UNCHANGED in 5.22→6.8 — the CLAUDE.md placement mandate and audit_cron_prompts.py G4 assumptions hold; do not rewrite them.
Bottom line: Hold on 5.22. The pin reason (image-gen #90074) is unresolved through 6.8, 6.8 is flagged-skip, and the move adds three one-way SQLite migrations. If a non-image driver forces it and #88312 is confirmed fixed in the changelog, land on 6.6 (not 6.8), re-auth codex first, run the tight canary above, and keep state.tar.gz + the :v2026.5.22 AR tag as the one-command rollback.
The GitHub issue numbers cited (#90074, #94570, #90361, #94033, #88312, #93886, #31425) and the "6.8 is community-flagged skip" claim come from the workflow's web-research agent and have not been independently re-fetched. Before acting on any GO, confirm each issue exists and matches the described symptom (per the verify-upstream-issue-refs lesson). The version landscape, the SQLite-migration deltas, and the internal runbook steps are grounded in local files + changelog and are higher-confidence.