Skip to content

Formal Verification (Security Models)

What This Is (Plain English)

Most software security is tested by trying things and seeing if they break. Formal verification is different — it uses math to prove that certain bad things can't happen, even in situations nobody thought to test.

OpenClaw maintains a set of TLA+ models — small, precise descriptions of how critical security mechanisms are supposed to work. A tool called TLC (the TLA+ model checker) then exhaustively explores every possible sequence of events within a bounded state space: every ordering of messages, every race condition, every edge case. If any sequence violates a security property, TLC produces a counterexample trace — a step-by-step replay showing exactly how things went wrong.

Think of it like this: instead of writing a test that says "try these 5 scenarios and check the result," formal verification says "try every possible scenario up to this complexity bound and prove none of them break the rule."

What the green/red pattern means

Each security claim has two model runs:

  • Green (positive): The model with correct implementation. TLC explores all states and finds no violations. This is the proof that the security property holds.
  • Red (negative): The model with a deliberately introduced bug — a realistic mistake a developer could make (e.g., removing an auth check, making a non-atomic operation). TLC finds a violation and produces a counterexample trace. This proves the model is actually checking something meaningful — if the "broken" version also passed, the model would be too weak.

The red runs are just as important as the green ones. A model that can't catch a known bug isn't providing real assurance.

What this does NOT mean

  • These are models of the security design, not the TypeScript implementation itself. The code could drift from the model.
  • TLC explores a bounded state space. "No violations found" means "no violations within these bounds," not "mathematically impossible in all cases."
  • Some claims depend on environmental assumptions (correct deployment, correct config). The model checks the logic, not whether you deployed it right.

Where the Models Live

Models are maintained in a separate repository: vignesh07/openclaw-formal-models.

Reproducing Results

bash
git clone https://github.com/vignesh07/openclaw-formal-models
cd openclaw-formal-models

# Java 11+ required (TLC runs on the JVM).
# The repo vendors a pinned tla2tools.jar and provides bin/tlc + Make targets.

make <target>

Verified Security Claims

1. Gateway Exposure & Open Gateway Misconfiguration

Claim: Binding beyond loopback without authentication makes remote compromise possible. Token or password authentication blocks unauthenticated attackers.

This is the most basic security property — if you expose the gateway to the network without auth, anyone can talk to your agent. The model proves that token/password auth prevents this under the modeled assumptions.

RunTargetExpected
Greenmake gateway-exposure-v2No violations
Greenmake gateway-exposure-v2-protectedNo violations
Redmake gateway-exposure-v2-negativeCounterexample found

See also: docs/gateway-exposure-matrix.md in the models repo.

2. Nodes.run Pipeline (Highest-Risk Capability)

Claim: nodes.run requires (a) a node command allowlist plus declared commands and (b) live approval when configured. Approvals are tokenized to prevent replay.

nodes.run is the most dangerous capability in OpenClaw — it allows remote command execution between paired nodes. The model verifies that the full authorization chain (allowlist check → command declaration → live approval → single-use token) cannot be bypassed.

RunTargetExpected
Greenmake nodes-pipelineNo violations
Greenmake approvals-tokenNo violations
Redmake nodes-pipeline-negativeCounterexample found
Redmake approvals-token-negativeCounterexample found

3. Pairing Store (DM Gating)

Claim: Pairing requests respect TTL and pending-request caps.

Pairing is how new contacts get authorized to talk to the bot. The model proves that expired pairing codes can't be reused and that an attacker can't flood the system with unlimited pending requests.

RunTargetExpected
Greenmake pairingNo violations
Greenmake pairing-capNo violations
Redmake pairing-negativeCounterexample found
Redmake pairing-cap-negativeCounterexample found

4. Ingress Gating (Mentions & Control-Command Bypass)

Claim: In group contexts requiring mention, an unauthorized "control command" cannot bypass mention gating.

When requireMention is enabled in a group, messages are only processed if they mention the bot. The model proves that control commands (like /reset or /elevated) can't sneak past this gate without a mention.

RunTargetExpected
Greenmake ingress-gatingNo violations
Redmake ingress-gating-negativeCounterexample found

5. Routing / Session-Key Isolation

Claim: DMs from distinct peers do not collapse into the same session unless explicitly linked or configured.

Session isolation is critical — if Alice's conversation bleeds into Bob's, that's a data leak. The model proves that the routing logic keeps sessions separate by default.

RunTargetExpected
Greenmake routing-isolationNo violations
Redmake routing-isolation-negativeCounterexample found

v1++ Extended Models (Concurrency & Retries)

These follow-on models tighten fidelity around real-world failure modes: non-atomic updates, retries, and message fan-out. These are the kinds of bugs that only appear under concurrent load and are nearly impossible to catch with conventional testing.

6. Pairing Store Concurrency & Idempotency

Claim: The pairing store enforces MaxPending and idempotency even under concurrent interleavings. Check-then-write must be atomic/locked. Refresh operations must not create duplicate entries.

Race conditions in pairing could allow an attacker to exceed the pending request cap by sending concurrent requests that all pass the "is there room?" check before any of them write. The model proves this can't happen with proper locking.

RunTargetExpected
Greenmake pairing-raceNo violations (atomic cap check)
Greenmake pairing-idempotencyNo violations
Greenmake pairing-refreshNo violations
Greenmake pairing-refresh-raceNo violations
Redmake pairing-race-negativeCounterexample (non-atomic begin/commit race)
Redmake pairing-idempotency-negativeCounterexample found
Redmake pairing-refresh-negativeCounterexample found
Redmake pairing-refresh-race-negativeCounterexample found

7. Ingress Trace Correlation & Idempotency

Claim: Ingestion preserves trace correlation across fan-out and is idempotent under provider retries.

When one external event (e.g., a Telegram message) fans out into multiple internal messages, every part must keep the same trace/event identity. Provider retries must not cause double-processing. If provider event IDs are missing, deduplication falls back to a safe key (e.g., trace ID) to avoid dropping distinct events.

RunTargetExpected
Greenmake ingress-traceNo violations
Greenmake ingress-trace2No violations
Greenmake ingress-idempotencyNo violations
Greenmake ingress-dedupe-fallbackNo violations
Redmake ingress-trace-negativeCounterexample found
Redmake ingress-trace2-negativeCounterexample found
Redmake ingress-idempotency-negativeCounterexample found
Redmake ingress-dedupe-fallback-negativeCounterexample found

Claim: Routing keeps DM sessions isolated by default. Channel-specific dmScope overrides win over global defaults. identityLinks collapse sessions only within explicitly linked groups, not across unrelated peers.

This model verifies the precedence chain: if you set dmScope: "per-channel-peer" globally but override it to "per-peer" for Telegram, the Telegram setting wins. And if you link Alice's WhatsApp and Telegram identities via identityLinks, their sessions merge — but Bob's sessions stay separate.

RunTargetExpected
Greenmake routing-precedenceNo violations
Greenmake routing-identitylinksNo violations
Redmake routing-precedence-negativeCounterexample found
Redmake routing-identitylinks-negativeCounterexample found

Coverage Summary

DomainClaims ModeledGreen RunsRed Runs
Gateway exposureAuth enforcement on network-bound listeners21
Nodes.run pipelineAllowlist + approval + token anti-replay22
Pairing storeTTL, caps, concurrency, idempotency, refresh66
Ingress gatingMention-required bypass prevention11
Ingress tracesCorrelation, idempotency, dedupe fallback44
Routing/sessionsIsolation, dmScope precedence, identityLinks44
Total1918

Every green run is a bounded proof. Every red run confirms the model catches real bugs. Together, they form an executable, attacker-driven security regression suite for OpenClaw's core authorization and isolation logic.