Architecture
docsxai is a small family of packages around one deterministic core. The
engine owns the flow-file parser, the Playwright-backed runtime, the
calibration aids (lint, diagnose, flow-tree, style), the plugin
runtime, and the target-site auth strategies. Everything else orchestrates
around it: the Claude Code plugin and the standalone MCP server are invocation
surfaces over the engine, the backend persists doc packs, the viewer renders
them. None of the satellites adds browser primitives of its own.
The two-mode split
Section titled “The two-mode split”Calibration is AI-assisted and rare. A host agent - through the
Claude Code plugin or the MCP server -
drives discovery against the live app (that part is
browxai’s surface), picks one canonical
locator per step, and commits the result as a flow-file. The engine helps at
write-time, not run-time: lint catches authoring mistakes statically,
diagnose packages halt context into typed recommendations, flow-tree
visualises the extends graph, and the actionable() probe says whether a
selector is clickable before the step is ever written down.
Execution is deterministic and continuous. docsxai run replays the
flow through headless Chromium with no agent and no MCP in the loop. The
environment block (frozen clock, pinned locale, timezone, viewport, color
scheme) makes the same flow against the same target state produce
byte-identical screenshots; a keystone test enforces that against real
Chromium on every change to the runtime.
The engine never calls a model
Section titled “The engine never calls a model”The engine has no model-provider SDK anywhere in its dependency tree, and the project treats adding one as a contract violation. Calibration-time inference is supplied by whatever host agent you already run; execution-time inference does not exist. Two consequences worth internalising:
- Halts are a feature. When a locator or success check fails, the run
halts with a
[cause: ...]prefix instead of asking a model to guess a fallback. Drift is a signal to recalibrate, not to absorb silently. - The cost story stays honest. A doc refresh costs one headless browser session, so running it per commit is no more exotic than running your Playwright suite.
The BrowserDriver seam
Section titled “The BrowserDriver seam”The runtime is written against a thin BrowserDriver interface, not against
Playwright directly: goto, click, fill, the wait primitives, the
success-check reads, screenshot, boundingBox, and the write-time
actionable(selector) probe. The one Playwright integration point
(PlaywrightDriver) stays small and is the engine’s single Playwright import
site. This seam is what lets browxai slot in as the model-agnostic discovery
driver during calibration while execution keeps its own raw Playwright
sessions - and it keeps the runtime testable without a browser at all.
The package family
Section titled “The package family”| Package | Role |
|---|---|
| engine | Parser, runtime, CLI, auth strategies, plugin runtime, exporters. |
| plugin | Claude Code plugin: calibrate + diagnose skills, deterministic commands. |
| mcp | Stdio MCP server for any host: orchestration + doc-pack introspection. |
| backend | Doc-pack persistence: revisions, blobs, OAuth 2.1, GitHub webhook. |
| viewer | Interactive viewer, browser-free burn renderer, Starlight emitter. |
| plugin-confluence | Publisher plugin: idempotent Confluence Cloud push. |
| plugin-starlight | Renderer plugin: production Starlight docs site. |
Every arrow in that table points inward: surfaces wrap the engine, the engine
wraps BrowserDriver, and nothing on the execution path knows an agent
exists.