Rotations and upgrades
audience: operators
Every routine change in a running instance falls into one of these procedures. Follow them verbatim; the consensus and crypto are unforgiving about accidental divergence.
Rolling a committee server (restart, same identity)
Safe any time. Minority-restart is handled by Raft automatically.
- Stop the target server with
SIGTERM. Wait for graceful exit (under 5 s). - Replace the binary / restart the container / whatever triggered the rollout.
- Start the server with the same
ZIPNET_INSTANCE,ZIPNET_SECRET, andZIPNET_COMMITTEE_SECRETas before. - Observe
mosaik_groups_leader_is_localon the remaining servers — election should settle within a few seconds. - Once the restarted server’s log shows
round finalized, move to the next one.
Do not restart a majority of the committee simultaneously — that drops quorum and halts round progression until a majority is back up.
Adding a committee server
- Provision the new node. Assign it a fresh
ZIPNET_SECRETseed. - Distribute the same
ZIPNET_INSTANCEandZIPNET_COMMITTEE_SECRETto it. - Start it with
--bootstrap <peer_id_of_any_existing_server>. - Wait for the new server’s log to print
round finalized— it has caught up. - Update your operational runbook, monitoring targets, and audit log to reflect the added node.
The ServerRegistry collection automatically reflects the new
member within one round. Clients start including the new server in
their pad derivation from the next OpenRound the leader issues.
Removing a committee server
- Announce the removal at least one gossip cycle ahead (default 15 s) so catalog entries expire cleanly.
SIGTERMthe target node.- Verify the remaining servers still form a majority and continue
to finalize rounds (
round finalizedevents in the logs).
Security warning
A removed server retains its DH secret. If that secret is not wiped, an adversary who later compromises the decommissioned machine can replay historic rounds and compute that server’s share of past pads. Combined with any other committee server’s DH secret compromise, this would break anonymity of past rounds. Wipe DH secrets on decommission.
Rotating a committee server’s long-term key
v1 does not have first-class key rotation. The procedure is “decommission + re-add”:
- Remove the old server (above).
- Add a new server with a fresh
ZIPNET_SECRET(above).
The committee’s GroupId does not change (it depends on the
instance name and shared ZIPNET_COMMITTEE_SECRET, not on
individual node identities), so the Raft group persists across the
swap. The ServerRegistry entry is updated automatically.
Rotating the committee secret
This is disruptive: changing ZIPNET_COMMITTEE_SECRET changes the
GroupId, so the old committee is abandoned. External publishers
compiled against the instance name still bond, but the committee
they find is new.
- Announce a maintenance window.
- Stop every client, aggregator, and committee server on this instance.
- Distribute the new
ZIPNET_COMMITTEE_SECRETto all committee members. - Start the committee first, then the aggregator, then the clients.
Rotating round parameters
RoundParams (num_slots, slot_bytes, tag_len) is folded into
the committee’s state-machine signature. Changing it is equivalent
to rotating the committee secret (above), and it is a breaking
change for any publisher that compiled the old parameters in —
meaning in practice you bump the instance.
See Retiring and replacing an instance below.
Dev note
Developers changing
RoundParamsin code must also bump the signature string inCommitteeMachine::signature()when appropriate — otherwise old and new nodes silently derive the sameGroupIdbut disagree on apply semantics. See The committee state machine.
Rebuilding a TDX image
Rebuilding the committee or client image produces a new MR_TD. The committee’s ticket validator is pinned to a specific MR_TD, so a rebuild requires coordinated rollout:
- Build the new image with
cargo build --release(the mosaik TDX builder runs inbuild.rs, producing a freshmrtd.hex). - Publish the new
mrtd.hexto your release-notes channel. - Decide whether the change is ABI-compatible with the current
committee’s expectations:
- Patch-level image change (kernel patch, initramfs tweak,
no wire-format or state-machine change): accept both MR_TDs
transiently by updating the committee’s
require_mrtdlist to include the new hash, roll the committee hosts one at a time to the new image, then drop the old MR_TD from the allow-list. - Breaking change (new state-machine signature, new wire
format, new
RoundParams): treat it as retiring the instance (below).
- Patch-level image change (kernel patch, initramfs tweak,
no wire-format or state-machine change): accept both MR_TDs
transiently by updating the committee’s
- Sign and publish the new MR_TD, along with the retirement window for the old one, so publishers can rebuild their own images in time.
Retiring and replacing an instance
Use this path whenever a cross-compatibility boundary moves
(RoundParams, CommitteeMachine::signature, wire format, breaking
MR_TD change). You have two idiomatic versioning stories:
- Version in the name. Stand up the new deployment under a new
instance name (
acme.mainnet.v2). Old and new run in parallel for the transition window; publishers re-pin and rebuild at their own pace; you tear down the old instance when traffic has drained. The cleanest story for external publishers; forces them to cut a release. - Lockstep release against a shared deployment crate. Keep the instance name stable, cut a new deployment-crate version pinning the new state-machine signature, and coordinate operator + publisher upgrades as a single release event. Avoids instance-ID churn at the cost of tighter release-cadence coupling.
Zipnet v1 does not mandate which you pick; see Designing coexisting systems on mosaik — Versioning under stable instance names for the full tradeoff.
Retirement itself is just stopping every server under the old
instance name. Publishers still trying to bond see ConnectTimeout;
they rebuild against the new name or the new deployment crate and
reconnect.
Upgrading the binary
Patch-level upgrades (no CommitteeMachine::signature change, no
RoundParams change, no wire format change, no MR_TD change if
TDX-gated) are safe to roll one node at a time following the restart
procedure.
Upgrades that change any of those four cross a compatibility boundary — treat them like retiring the instance.
Dev notes on where to look in source:
WIRE_VERSIONincrates/zipnet-proto/src/lib.rsCommitteeMachine::signatureincrates/zipnet-node/src/committee.rsRoundParams::default_v1incrates/zipnet-proto/src/params.rs
Any change to those requires a coordinated restart of the whole instance.
See also
- Running a committee server
- Incident response — what to do when a restart doesn’t bring the node back cleanly.
- Designing coexisting systems on mosaik — Versioning under stable instance names