ADR-0004: Add read-only workspace tasks and no-op completion#
Date: 2026-05-27 Status: accepted Deciders: Perago maintainers
Context#
Perago workspace tasks currently model every successful workspace attempt as a LakeFS publication. That creates a bad edge case: a task may read workspace files without producing workspace changes, or it may be allowed to write but happen to produce no diff for a particular input. Creating LakeFS empty commits for those attempts records executor activity in workspace history, not workspace content change, and can also trigger LakeFS errors. We need a contract that supports read-only workspace nodes and writable-but-no-op nodes without weakening the existing soft-fenced publication model.
Decision#
WorkspaceSpec gets an explicit read_only: bool = False parameter.
read_only=True declares a workspace task that consumes versioned workspace input but never publishes workspace changes. It downloads the workspace, runs the task body and guardrails, returns a WorkspaceOutput whose ref equals the input workspace.ref, and skips diff checks, target HEAD checks, staging branches, LakeFS commits, and publication. It is not an OS-level readonly mount; writes to the attempt-local workspace are discarded during cleanup.
@task(
name="metadata.inspect",
owner_email="data@example.com",
workspace=WorkspaceSpec(prefix="/audio/render", read_only=True),
)
def inspect_metadata(workspace: Path, params: InspectParams) -> InspectOutput:
manifest = workspace / "manifest.json"
return InspectOutput(found=manifest.exists())
read_only=False remains the default. Writable workspace tasks check whether the local workspace projection changed after the task body and post guardrails:
State |
Runtime behavior |
Output ref |
|---|---|---|
diff is non-empty and |
stage and merge. |
published ref |
diff is non-empty and |
stage and replacement publish to the staged commit. |
staged commit |
diff is empty and |
complete without staging or committing. |
input ref |
diff is empty and |
treat |
input ref |
any other writable HEAD state |
fail closed. |
none |
Perago does not create LakeFS empty commits. The fact that a node ran belongs to Conductor result state and worker logs, not to LakeFS workspace history.
TaskControls(publish_budget=...) remains invalid for workspace-free tasks. If a read-only workspace task configures publish_budget, perago check, perago extract, and perago start should emit one warning during validation/startup and ignore the budget; task execution must not warn for every attempt.
WorkspaceSpec(read_only=True) disables workspace publication; TaskControls.publish_budget is ignored.
Alternatives Considered#
Allow LakeFS empty commits#
Pros: Minimal runtime branching; every workspace task still returns a new ref.
Cons: Pollutes workspace history with executor activity instead of content changes; does not help read-only nodes; can fail in LakeFS when there are no changes.
Why not: Workspace history should represent workspace content, while node execution belongs to Conductor.
Infer read-only behavior from an empty diff only#
Pros: No new task declaration field.
Cons: Cannot distinguish a node that is intentionally read-only from a writable node that happened to produce no diff; forces read-only nodes into writable HEAD checks.
Why not: Read-only is part of the task's workspace access contract and should be explicit.
Reject publish budget on read-only workspace tasks#
Pros: Keeps configuration strict.
Cons: Turns a harmless stale parameter into a hard failure and makes migrations noisier.
Why not: A startup/check warning is enough because the budget is simply ineffective when publication is disabled.
Enforce OS-level read-only workspaces#
Pros: Catches accidental writes by task code.
Cons: Adds platform-specific filesystem behavior and complicates local cleanup and tests.
Why not: The contract is "no publication", not "immutable local filesystem"; accidental local writes are discarded.
Consequences#
Workspace task no longer means "always writes a new commit." It means "receives a versioned workspace and returns a workspace output." A workspace output may carry the same ref as the input.
Runtime implementation must add a changed-workspace detection point before staging writable tasks, and it must add a no-op branch handling path for writable tasks with empty diffs. Read-only tasks bypass LakeFS publication checks entirely.
Documentation and generated references must consistently distinguish:
workspace-free tasks: no workspace input/output;
read-only workspace tasks: workspace input/output, no publication;
writable no-op completion: workspace input/output, no empty commit, HEAD-state check;
writable publication: workspace input/output with a new published ref.
This ADR refines ADR-0003. The soft-fenced publication protocol still applies whenever a writable task publishes changes or performs no-op branch reconciliation.