Commit Sweeper
Commit Sweeper reviews commits that land on a target repository's main branch. It is intentionally separate from the issue/PR cleanup sweeper: it does not close items, write comments, or try to fix code. It produces one markdown report per commit. It can optionally publish a GitHub Check Run for the commit when create_checks=true.
#Goals
- Review every code-bearing commit on
mainfor regressions, bugs, and security - Use one Codex worker per reviewed commit.
- Keep reports human-readable and markdown-first.
- Keep the storage path canonical so each commit has at most one report.
- Avoid spending Codex time on pure documentation, changelog, asset, or other
- Make the lane easy to disable, manually trigger, and backfill over historic
issues.
non-code commits.
ranges.
#Storage
Reports live at:
records/<repo-slug>/commits/<40-char-sha>.md
That path is the source of truth. Rerunning a commit review overwrites the same file. Manual reruns with an additional prompt also overwrite the same file.
Report front matter includes both commit timestamps and review timestamps:
commit_authored_at: author timestamp from the target commitcommit_committed_at: committer timestamp from the target commitreviewed_at: timestamp for the ClawSweeper report generation
Skipped non-code commits still get a report at the same path with result: skipped_non_code. This preserves a complete audit trail without starting Codex for commits that cannot affect runtime behavior.
Use the report lister for time windows instead of date-based storage folders:
pnpm run build
pnpm commit-reports -- --since 6h
pnpm commit-reports -- --since "24 hours ago" --findings
pnpm commit-reports -- --since 7d --non-clean
pnpm commit-reports -- --repo openclaw/openclaw --author steipete --since 7d
The canonical storage stays flat so a rerun can overwrite exactly one file for the commit without first rediscovering a date bucket.
#Triggers
Target repositories dispatch push events from main to openclaw/clawsweeper with repository_dispatch.
The receiver workflow is .github/workflows/commit-review.yml.
Manual workflow dispatch supports:
target_repo: repository to inspectcommit_sha: commit SHA to review, or end of a historic rangebefore_sha: optional range start; when present, review every commit inadditional_prompt: appended to the Codex prompt for this runcreate_checks: create/update GitHub Checks. Leave blank to use the receiverenabled: emergency no-op switchcommit_offset: internal continuation offset
before_sha..commit_sha
repo variable fallback; otherwise pass true or false. The effective default is false.
The receiver waits 60 seconds by default before selecting commits. This gives the target main push event time to settle across GitHub and the runner without holding the planner for a full review cycle. Override it on openclaw/clawsweeper with:
CLAWSWEEPER_COMMIT_REVIEW_SETTLE_SECONDS=60
Use 0 for manual backfills where the target commit range is already settled, or temporarily raise the value during GitHub event lag incidents.
The receiver enforces that the commit is reachable from origin/main. Review workers then check out current target main and reference the reviewed commit by SHA/range rather than detaching the whole target repository at the commit.
#Scaling
Commit Sweeper defaults to 8 commits per workflow page. The receiver clamps CLAWSWEEPER_COMMIT_REVIEW_PAGE_SIZE between 1 and 256, then pages large ranges:
- select up to the configured page size
- classify them cheaply
- start one matrix worker per code-bearing commit
- write skipped reports for non-code commits
- commit all reports
- dispatch the next page when more commits remain
A 200-commit push runs as multiple continuation runs at the default page size. Raise the page size only when the org has enough rate-limit headroom.
#Cheap Classification
The plan job classifies each selected commit before creating the Codex matrix. It uses git diff --name-only for normal commits and git diff-tree for root commits.
Codex runs when any changed path looks reviewable:
- source files
- tests
- scripts and
bin/ - GitHub workflows
- package manifests
- lockfiles
- build/runtime/config files
Codex is skipped when all changed paths are non-code:
- docs directories
- changelog-only changes
- README/license/notice-style files
- markdown/text documentation
- common image/video/PDF assets
Mixed commits are reviewed. A commit that changes both docs and code gets a Codex worker.
#Codex Review
The prompt lives in prompts/review-commit.md.
Codex reviews the provided commit range and is expected to read beyond the diff:
- changed files in full
- callers/callees
- configuration and runtime entry points
- adjacent tests and docs when they define contracts
- dependency manifests and lockfiles when relevant
- package health, release notes, install scripts, and advisories when relevant
- general web sources when current external facts matter
- focused live tests or smoke checks when feasible
The time budget is 30 minutes per commit.
Codex returns markdown only. The front matter is small and stable so tooling can index results and optionally publish checks, but the body is meant for maintainers to read.
#Report Results
Expected result values:
nothing_found: high-confidence clean reviewfindings: concrete potential bug, regression, or security issueinconclusive: insufficient confidence or blocked verificationfailed: Codex/tooling failed before a reliable reportskipped_non_code: cheap classifier skipped a non-code-only commit
Issue categories Codex looks for:
- bug
- regression
- security
- supply-chain
- data loss
- privacy
- reliability
- concurrency
- compatibility
- concrete test gaps that hide a plausible bug
The prompt explicitly excludes style nits, broad refactor taste, generic cleanliness feedback, speculative security concerns without an executable path, and test coverage complaints without a concrete risk.
#ClawSweeper Repair Dispatch
After reports are committed, .github/workflows/commit-review.yml can dispatch actionable result: findings reports to this repo's repair-commit-finding-intake.yml workflow. The older repository_dispatch mode is still available in the CLI for tests or future App-permission changes, but the workflow uses workflow_dispatch so the ClawSweeper App only needs Actions write access on openclaw/clawsweeper.
The dispatch is intentionally report-based. ClawSweeper sends the target repo, commit SHA, report repo, report path, report URL, severity, check conclusion, and source run URL. The repair intake fetches the report from latest openclaw/clawsweeper@main, writes an audit record, and decides whether an automatic PR makes sense on latest target main.
Disable this without code changes by setting:
CLAWSWEEPER_COMMIT_FINDINGS_ENABLED=false
The ClawSweeper repair lane owns the PR lifecycle, validation, branch reuse, and no-merge gate. Security-sensitive findings should remain audit-only.
#Optional GitHub Checks
The check name is:
ClawSweeper Commit Review
Check conclusions:
success: high-confidence clean report or skipped non-code commitfailure: high-confidence high/critical findingneutral: lower-severity finding, inconclusive review, or failed reviewtimed_out: Codex timed out
Checks are created on the target repository commit by the ClawSweeper GitHub App. They behave like CI in GitHub's UI, but are separate from the target repository's normal test workflows.
Checks are disabled by default to avoid spending GitHub App installation rate limit on status publishing. Enable them per run with create_checks=true, by sending create_checks:true in the dispatch payload, or by setting this variable in the target repository that runs the dispatch workflow:
CLAWSWEEPER_COMMIT_REVIEW_CREATE_CHECKS=true
The receiver also honors the same variable on openclaw/clawsweeper when a manual or repository dispatch omits create_checks.
Commit Sweeper does not post comments. Markdown reports are the primary public surface; checks are an optional secondary surface.
#Safety
The review worker receives only target read credentials while Codex runs. The Codex subprocess gets that read token as GH_TOKEN so it can hydrate mentioned issues, PRs, workflow runs, and commit metadata during review. Write/check credentials are created only after Codex exits.
The Codex environment strips GitHub and app secrets before subprocess launch.
Commit Sweeper is main-only. PR or branch review is deliberately out of scope for this lane.
#Enable / Disable
Target repositories can disable hook-based dispatch with:
CLAWSWEEPER_COMMIT_REVIEW_ENABLED=false
Manual dispatch can also set enabled=false.
Checks are disabled by default. Enable or disable them without changing code via:
- manual dispatch input
create_checks=true|false - repository dispatch payload
create_checks:true|false - target repo variable
CLAWSWEEPER_COMMIT_REVIEW_CREATE_CHECKS=true|false
Reports are always written either way.
#Related Files
.github/workflows/commit-review.yml: receiver workflowdocs/commit-dispatcher.md: target repository dispatch templatesrc/commit-sweeper.ts: commit review CLIsrc/commit-classifier.ts: cheap path classifier and skipped reportssrc/commit-checks.ts: GitHub Check Run publishingprompts/review-commit.md: Codex review prompt