Back to Blog

Loom: An Agent-First Browser Runtime

Playwright was built for humans. Loom is built for the agent driving the browser — open source, deterministic, replayable, MCP-native.

Johannes Rummel

Johannes Rummel

Staff Engineer

May 18, 2026

It's 11pm. You shipped a sign-up flow this morning. You ask Claude to wire up Playwright tests so you can stop manually clicking through every release. The suite passes once. By morning, half the runs are red and you're three waitForTimeout(2000) deep in something you're not proud of.

This was the moment I started building Loom.

The agents we use every day — Claude Code, Cursor, Claude Desktop — already write code, run tests, and shape your codebase. The one thing they can't reliably do is drive a real browser. Not because the model is too dumb. Because Playwright and Puppeteer were built for humans watching a screen. Errors are strings, not codes. No replay. No isolation. No budget. Setup takes a weekend.

We just open-sourced Loom — a local, deterministic, replayable browser runtime that plugs into your AI agent as a native MCP tool. This post is the writeup.

What's actually broken

Watching agents try to use Playwright on a real product, the pain breaks into six categories.

Agent ↔ Loom ↔ Browser — bidirectional flow, every action receipted, every run replayable
  • Flaky. Same prompt, two different results.
  • Unrepeatable. No way to rerun the exact session, no audit trail when something breaks.
  • Unsafe. A page from an LLM-supplied URL can exfiltrate cookies, drop files, or register a service worker. Blast radius: your laptop.
  • Tangled. Setup is Chromium + Playwright + a test runner + tool plumbing + credentials.
  • Costly. Escape local setup for a cloud service and you pay per browser-minute. Flake multiplies the bill.
  • Blind. No time-travel inspection. The DOM at action 4 of 20 is gone. You rerun and hope.

Six symptoms of one mismatch: the runtime was designed for a human watching a single run. The caller is now an LLM running 200 sessions a day.

What loom commits to

Loom is Apache-2.0 / MIT, runs locally on macOS and Linux, and bundles its own MCP server. The interesting parts aren't the verb list — navigate, click, evaluate, snapshot, wait, the standard ten. They're the six architectural commitments behind those verbs.

1. The session is an append-only, hash-chained WAL. Every action emits an ActionReceipt with a prev_hash. The chain anchors back to a seed and epoch_ms. Replay re-executes the action list; the replay's hash chain is bit-equal to the source's by construction. loom session diff is "compare two manifests."

2. Every artifact is content-addressed by SHA-256. DOM snapshots, screenshots, exports — one CAS keyed by content hash. Replay equality, deduplication, and GC reference protection are all the same mechanism.

3. Errors are wire-schema, not strings. LoomErrorCode is a stable kebab-case enum (~25 codes) shared across every crate, process, and SDK. A linter asserts every emitter uses a documented variant. Your agent branches on kind, not regex on the message.

4. Untrusted code runs out-of-process or in WASM — never in the daemon. Chromium lives in a subprocess speaking CBOR shim_protocol; the daemon never imports CDP. The web.* verbs run in a WASM cdylib with a curated WIT host interface. A renderer compromise has to traverse two protocol boundaries to reach the daemon.

5. The core is platform-agnostic. loom-core imports zero macOS or Linux symbols. Platform code lives behind stable seams in sibling crates. The Linux build doesn't fork — it's the same core with one fewer adapter linked.

6. The action registry is the single source of truth. Every verb is declared once. Docs, man pages, CLI --help, and the JSON-RPC router all derive from it. A CI gate fails any PR where they drift.

None of these are bolted-on features. They reinforce each other — which is why replay-equality, typed errors, and process isolation aren't on a roadmap.

The architecture

Eleven crates. Four processes. Three trust boundaries. About 200 Rust source files.

Loom architecture: clients, daemon, isolated runtimes, Chromium — across multiple trust zones and protocols

Clients (dashed): Claude Code, Claude Desktop, Cursor, anything speaking MCP — plus official Python and TypeScript SDKs. JSON-RPC over a Unix socket. Same wire contract for all because the action registry is the single source of truth.

Binaries: loom-cli, loom-mcp, loom-daemon.

loom-daemon is the only fully trusted process. Inside it: session_manager, manifest_writer, replay_engine, the SHA-256 content_store, budget_enforcer, the OAuth vault, determinism_harness (the script that freezes Math.random, Date.now, performance.now, and animation timing in the page), the JSON-RPC router, the kebab-case error enum, observability, the Playwright trace importer, and ~10 more modules.

Isolated runtimes (dashed blue): loom-surface-web is a WASM cdylib running the web.* verbs through a curated WIT host interface — clock, RNG, blob_put/get, net_request, shim_call. loom-shim-chromium is a separate subprocess under a supervisor with a typed restart budget, speaking CBOR shim_protocol to the daemon. It's the only thing in the system that imports CDP.

Chromium, pinned at v132, untrusted. A renderer compromise has to traverse two protocol boundaries to reach anything in the daemon — and never crosses into agent code.

What it looks like to use it

The same task — open a page, click a button — from the three entry points.

Three entry points — AI agent (Cursor/Claude), terminal (loom CLI), and SDK (Python/TS/Rust). Same session model. Same receipts.

All three create the same session, emit the same two receipts, write the same two entries into the same hash-chained WAL. The only difference is who's calling.

From the AI agent. Add the MCP server to your client config and loom.web.* shows up as native tools. The MCP server lazily creates a session on first call and reuses it across the conversation. No session_id plumbing.

From the terminal. loom session create returns an ID. loom action web.navigate --session $S produces a typed receipt with an action hash. loom session replay $S re-runs the chain and asserts bit-equality. loom session diff $A $B tells you which action diverged.

From the SDK. Python, TypeScript, or Rust:

PYTHON
with loom.Session.create(profile="standard") as s:
    r = s.navigate("acme.com")
    assert r.action_hash.startswith("a4f9c2")
    s.click("button:has-text('Sign up')")
    # s.session_id is replayable bit-for-bit

Session lifecycle and the replay guarantee are identical across all three. This is the part that's hard to do as a wrapper around Playwright. It's why we didn't.

When loom is the right call — and when it isn't

Right call when:

  • You want your agent to drive a real browser as a native tool — not write Playwright scripts you then run by hand.
  • Your AI-driven tests flake and you want same prompt → same run, with a hash you can diff when one breaks.
  • You're automating pages from untrusted sources and would rather a malicious page not reach your host.
  • You need a complete audit trail of agent actions for compliance, debugging, or reproducible benchmarks.

Wrong call when:

  • You need cross-browser coverage (Firefox, WebKit). Chromium only.
  • You need Windows. macOS and Linux only for now.
  • You're writing a traditional human-driven test suite. Playwright is older, has wider community support, and the trace viewer is excellent for that.
  • You'd rather pay someone else for browser-minutes. Browserbase + Stagehand is the right shape.

Loom earns its keep on determinism, MCP-native ergonomics, and process isolation — not on feature breadth.

Try it

Loom is pre-1.0. The README has a stability matrix covering what's frozen and what's still beta.

BASH
brew install mentiora-ai/loom/loom
loom postinstall   # fetches the pinned Chromium build
loom doctor

Add the MCP server to Claude Desktop or Cursor:

JSON
{
  "mcpServers": {
    "loom": {
      "command": "loom-mcp",
      "args": ["serve"]
    }
  }
}

Or from the Claude Code CLI:

BASH
claude mcp add loom -- loom-mcp serve

And the agent has a browser.

Repo: github.com/mentiora-ai/loom. Issues, PRs, and "this is the wrong shape" arguments welcome.

If you're running agents in regulated industries — health, finance, legal — and want a managed deployment with the audit-trail guarantees Loom provides, that's where Mentiora sits. Reach out: contact@mentiora.ai.

Ready to deploy AI that actually works?

Let us analyze your AI integration and deliver actionable insights on how to improve safety, usefulness, and revenue impact.

Contact Us