Commit Sweeper

Commit Sweeper reviews commits that land on a target repository's main branch. It is intentionally separate from the issue/PR cleanup sweeper: it does not close items, write comments, or try to fix code. It produces one markdown report per commit. It can optionally publish a GitHub Check Run for the commit when create_checks=true.

#Goals

Review every code-bearing commit on main for regressions, bugs, and security

issues.

Use one Codex worker per reviewed commit.
Keep reports human-readable and markdown-first.
Keep the storage path canonical so each commit has at most one report.
Avoid spending Codex time on pure documentation, changelog, asset, or other

non-code commits.

Make the lane easy to disable, manually trigger, and backfill over historic

ranges.

#Storage

Reports live at:

records/<repo-slug>/commits/<40-char-sha>.md

That path is the source of truth. Rerunning a commit review overwrites the same file. Manual reruns with an additional prompt also overwrite the same file.

Report front matter includes both commit timestamps and review timestamps:

commit_authored_at: author timestamp from the target commit
commit_committed_at: committer timestamp from the target commit
reviewed_at: timestamp for the ClawSweeper report generation

Skipped non-code commits still get a report at the same path with result: skipped_non_code. This preserves a complete audit trail without starting Codex for commits that cannot affect runtime behavior.

Use the report lister for time windows instead of date-based storage folders:

pnpm run build
pnpm commit-reports -- --since 6h
pnpm commit-reports -- --since "24 hours ago" --findings
pnpm commit-reports -- --since 7d --non-clean
pnpm commit-reports -- --repo openclaw/openclaw --author steipete --since 7d

The canonical storage stays flat so a rerun can overwrite exactly one file for the commit without first rediscovering a date bucket.

#Triggers

Target repositories dispatch push events from main to openclaw/clawsweeper with repository_dispatch.

The receiver workflow is .github/workflows/commit-review.yml.

Manual workflow dispatch supports:

target_repo: repository to inspect
commit_sha: commit SHA to review, or end of a historic range
before_sha: optional range start; when present, review every commit in

before_sha..commit_sha

additional_prompt: appended to the Codex prompt for this run
create_checks: create/update GitHub Checks. Leave blank to use the receiver

repo variable fallback; otherwise pass true or false. The effective default is false.

enabled: emergency no-op switch
commit_offset: internal continuation offset

The receiver waits 60 seconds by default before selecting commits. This gives the target main push event time to settle across GitHub and the runner without holding the planner for a full review cycle. Override it on openclaw/clawsweeper with:

CLAWSWEEPER_COMMIT_REVIEW_SETTLE_SECONDS=60

Use 0 for manual backfills where the target commit range is already settled, or temporarily raise the value during GitHub event lag incidents.

The receiver enforces that the commit is reachable from origin/main. Review workers then check out current target main and reference the reviewed commit by SHA/range rather than detaching the whole target repository at the commit.

#Scaling

Commit Sweeper defaults to 8 commits per workflow page. The receiver clamps CLAWSWEEPER_COMMIT_REVIEW_PAGE_SIZE between 1 and 256, then pages large ranges:

select up to the configured page size
classify them cheaply
start one matrix worker per code-bearing commit
write skipped reports for non-code commits
commit all reports
dispatch the next page when more commits remain

A 200-commit push runs as multiple continuation runs at the default page size. Raise the page size only when the org has enough rate-limit headroom.

#Cheap Classification

The plan job classifies each selected commit before creating the Codex matrix. It uses git diff --name-only for normal commits and git diff-tree for root commits.

Codex runs when any changed path looks reviewable:

source files
tests
scripts and bin/
GitHub workflows
package manifests
lockfiles
build/runtime/config files

Codex is skipped when all changed paths are non-code:

docs directories
changelog-only changes
README/license/notice-style files
markdown/text documentation
common image/video/PDF assets

Mixed commits are reviewed. A commit that changes both docs and code gets a Codex worker.

#Codex Review

The prompt lives in prompts/review-commit.md.

Codex reviews the provided commit range and is expected to read beyond the diff:

changed files in full
callers/callees
configuration and runtime entry points
adjacent tests and docs when they define contracts
dependency manifests and lockfiles when relevant
package health, release notes, install scripts, and advisories when relevant
general web sources when current external facts matter
focused live tests or smoke checks when feasible

The time budget is 30 minutes per commit.

Codex returns markdown only. The front matter is small and stable so tooling can index results and optionally publish checks, but the body is meant for maintainers to read.

#Report Results

Expected result values:

nothing_found: high-confidence clean review
findings: concrete potential bug, regression, or security issue
inconclusive: insufficient confidence or blocked verification
failed: Codex/tooling failed before a reliable report
skipped_non_code: cheap classifier skipped a non-code-only commit

Issue categories Codex looks for:

bug
regression
security
supply-chain
data loss
privacy
reliability
concurrency
compatibility
concrete test gaps that hide a plausible bug

The prompt explicitly excludes style nits, broad refactor taste, generic cleanliness feedback, speculative security concerns without an executable path, and test coverage complaints without a concrete risk.

#ClawSweeper Repair Dispatch

After reports are committed, .github/workflows/commit-review.yml can dispatch actionable result: findings reports to this repo's repair-commit-finding-intake.yml workflow. The older repository_dispatch mode is still available in the CLI for tests or future App-permission changes, but the workflow uses workflow_dispatch so the ClawSweeper App only needs Actions write access on openclaw/clawsweeper.

The dispatch is intentionally report-based. ClawSweeper sends the target repo, commit SHA, report repo, report path, report URL, severity, check conclusion, and source run URL. The repair intake fetches the report from latest openclaw/clawsweeper@main, writes an audit record, and decides whether an automatic PR makes sense on latest target main.

Disable this without code changes by setting:

CLAWSWEEPER_COMMIT_FINDINGS_ENABLED=false

The ClawSweeper repair lane owns the PR lifecycle, validation, branch reuse, and no-merge gate. Security-sensitive findings should remain audit-only.

#Optional GitHub Checks

The check name is:

ClawSweeper Commit Review

Check conclusions:

success: high-confidence clean report or skipped non-code commit
failure: high-confidence high/critical finding
neutral: lower-severity finding, inconclusive review, or failed review
timed_out: Codex timed out

Checks are created on the target repository commit by the ClawSweeper GitHub App. They behave like CI in GitHub's UI, but are separate from the target repository's normal test workflows.

Checks are disabled by default to avoid spending GitHub App installation rate limit on status publishing. Enable them per run with create_checks=true, by sending create_checks:true in the dispatch payload, or by setting this variable in the target repository that runs the dispatch workflow:

CLAWSWEEPER_COMMIT_REVIEW_CREATE_CHECKS=true

The receiver also honors the same variable on openclaw/clawsweeper when a manual or repository dispatch omits create_checks.

Commit Sweeper does not post comments. Markdown reports are the primary public surface; checks are an optional secondary surface.

#Safety

The review worker receives only target read credentials while Codex runs. The Codex subprocess gets that read token as GH_TOKEN so it can hydrate mentioned issues, PRs, workflow runs, and commit metadata during review. Write/check credentials are created only after Codex exits.

The Codex environment strips GitHub and app secrets before subprocess launch.

Commit Sweeper is main-only. PR or branch review is deliberately out of scope for this lane.

#Enable / Disable

Target repositories can disable hook-based dispatch with:

CLAWSWEEPER_COMMIT_REVIEW_ENABLED=false

Manual dispatch can also set enabled=false.

Checks are disabled by default. Enable or disable them without changing code via:

manual dispatch input create_checks=true|false
repository dispatch payload create_checks:true|false
target repo variable CLAWSWEEPER_COMMIT_REVIEW_CREATE_CHECKS=true|false

Reports are always written either way.

.github/workflows/commit-review.yml: receiver workflow
docs/commit-dispatcher.md: target repository dispatch template
src/commit-sweeper.ts: commit review CLI
src/commit-classifier.ts: cheap path classifier and skipped reports
src/commit-checks.ts: GitHub Check Run publishing
prompts/review-commit.md: Codex review prompt