Bridge Security Cluster
Cross-Chain Bridge Incident Response Playbook
This page is the response-oriented bridge security guide for what to do when unsafe cross-chain movement is already in play. It focuses on containment, queue control, validator coordination, communication discipline, and safe recovery sequencing.
Within this cluster
A cross-chain bridge incident response playbook is the containment sequence a team uses to stop unsafe message flow, verify which trust assumptions have failed, coordinate emergency authority, and reopen safely only after the affected bridge lane is understood. Strong response depends on pairing fast containment with slower governance and recovery controls.
Why Are Bridge Incidents Hard to Contain?
Bridge incidents are operationally hard because value movement can continue even after one visible component is paused. Multiple chains, queued messages, validator paths, and delayed execution lanes create a situation where partial containment often still leaves exploitable routes alive.
This page exists as the response companion to cross-chain message validation security. The validation page explains how trust should be enforced before incidents happen. This page explains what to do when a dangerous path is already active.
What Should Teams Do in the First Response Window?
| Phase | Goal | Primary Output | Owner |
|---|---|---|---|
| Detect | Confirm exploit pattern and blast radius | Incident declaration | Monitoring / SecOps |
| Contain | Stop additional unsafe movement | Pause matrix + action log | Bridge operations |
| Stabilize | Control queues and signer state | Queue snapshot + nonce/route locks | Protocol engineering |
| Recover | Restore minimum-safe functionality | Controlled reopen checklist | Incident commander |
The first 30 minutes should reduce attacker optionality, not answer every forensic question. Teams should identify which routes can still move value, freeze the highest-risk lanes first, and log every containment action with verifiable evidence.
Why Do Queue Control and Replay Safety Matter During Incidents?
One of the most common bridge response mistakes is assuming that frontend pause or deposit pause equals containment. It does not. If a queue of pre-existing signed or otherwise valid messages can still execute, the bridge may continue bleeding value after the “pause” announcement.
- Snapshot queue state before mutation.
- Freeze or invalidate exposed message lanes deliberately.
- Verify signer and validator integrity before trusting new attestations.
- Preserve evidence while containment is happening.
{
"incident_mode": "bridge-protective",
"actions": ["pause_high_risk_routes", "snapshot_queue", "verify_signers"],
"reopen_policy": "tiered_only"
}
Which Containment Actions Should Happen Before Public Reopen Plans?
Teams should not jump from emergency pause to reopen messaging while the technical containment picture is still weak. Before any public reopen path is announced, the team should confirm which message lanes are invalidated, which queues are preserved for investigation, which signer or validator assumptions have changed, and whether bridge pause authority is still bounded correctly.
- Confirm that high-risk routes are frozen or invalidated, not just hidden at the interface layer.
- Verify whether queued messages can still execute under stale assumptions.
- Reconcile signer, relayer, and validator trust before reopening any route with material value exposure.
- State which evidence threshold must pass before moving from containment to controlled recovery.
How Should Teams Separate Incident Command from Reopen Authority?
One of the most dangerous bridge-response failures is allowing the same pressure-driven command path that handles emergency containment to also own the final decision to restore normal value movement. Incident command should be optimized for speed, coordination, and containment. Reopen authority should be optimized for evidence review, skepticism, and recovery discipline.
- Incident command: declares the incident, coordinates containment, assigns technical owners, and controls the active response tempo.
- Reopen authority: decides when the bridge is safe enough to resume route activity and under what caps or supervision.
- Operational rule: the team that is best at stopping the bleed is not automatically the right team to judge that normal trust assumptions are restored.
This separation matters because bridge incidents create immense pressure to show forward motion. Without a clearer governance boundary, “response progress” easily turns into premature reopen pressure.
How Should Teams Define Decision Rights Before Incident Day?
Teams need explicit ownership before the bridge is under attack. At minimum, someone must own incident command, validator coordination, queue control, communications, and evidence collection. If those lanes are vague, reaction speed collapses and containment becomes noisy.
- Who can trigger protective mode?
- Who can pause routes or adjust delay queues?
- Who owns validator or signer coordination?
- Who publishes public updates and next checkpoints?
Decision rights should also separate fast containment from reopen authority, just like good protocol pause design separates emergency action from long-term governance judgment.
How Should Recovery Happen Without Reopening Risk Too Early?
Recovery should be tiered. Re-enable lower-risk routes first, keep value caps tight, and increase observation sensitivity during the recovery window. Users generally tolerate slower recovery if they believe the recovery plan is coherent and explicitly risk-aware.
| Recovery Tier | What reopens | Condition |
|---|---|---|
| Tier 1 | Lowest-risk routes | Stable telemetry + queue confidence |
| Tier 2 | Normal routes with caps | No anomaly resurgence during observation window |
| Tier 3 | Higher-value or more complex routes | Signer confidence and policy checks restored |
If the bridge still shows confidence gaps in signer trust, validation behavior, or replay safety, the next read should be message validation security or validator compromise defense, not a hasty reopen checklist. Teams should also define what a controlled reopen means for message queues, route caps, and bridge pause authority, rather than treating “resume” as one binary event.
As part of the bridge cluster, this page should route readers back into the underlying failure class once the emergency phase is stable. Validation problems route back to message validation security. Source-settlement uncertainty routes back to finality and reorg defense. Validator trust failures route back to validator compromise defense. Containment-design questions route back to pause authority design.
When Should Incident Response Cancel Queued Bridge Work?
Incident response is incomplete if teams only pause interfaces and signer paths while leaving compromised or ambiguous queued work untouched. Response design should include a clear rule for when pending messages are invalidated, how scope is defined, and how replay-safe recovery is preserved. For that control layer, see bridge emergency queue invalidation design.
Teams also need a handoff from emergency containment into trust recovery. If the incident has already forced signer eviction and the next question is how a new quorum becomes credible again, continue to bridge signer rotation and trust reconstitution.
If the concern is that a valid message may still trigger unsafe effects on the receiving chain, continue to cross-chain destination execution guardrails before route reopen assumptions are widened.
Frequently Asked Questions
Should teams pause the whole bridge immediately?
Teams should pause the routes and execution lanes that still expose value, then decide on full pause based on queue state and exploit path.
What is the most common bridge response mistake?
Assuming frontend pauses equal containment while message queues, signer paths, or delayed execution lanes remain active.