Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Roadmap to v2

audience: contributors

These are the simplifications baked into v1 and the planned path to address each. The order here is not the implementation order — it is the order in which each change affects the external behavior of the system.

Footprint scheduling

v1: deterministic slot per (client, round) via keyed-blake3 mod num_slots. Collision probability ≈ N / num_slots; at N = 8, num_slots = 64 that’s ~12%.

v2: the paper’s two-channel scheduling (§3.2). A side channel of 4 * N slots holds footprint reservations. Clients pick a random slot and an f-bit random footprint each round, write the footprint into the scheduling vector, and in round r+1 use the assigned message slot only if their footprint round-tripped unchanged.

Implementation shape: add a second RoundParams::num_sched_slots and a second broadcast vector, run the same HKDF-AES pad derivation against a distinct label "zipnet/pad/sched/v1". The CommitteeMachine consumes two aggregates per round (message + schedule) and splits the final broadcast into two halves. WIRE_VERSION bump: 1 → 2.

Cover traffic

v1: non-talking clients omit their envelope entirely. This narrows the anonymity set to active talkers.

v2: clients with no message produce a pure-pad envelope (msg_i = 0, all pads XORed in). The aggregator and committee process these indistinguishably from talker envelopes. The only visible change at the state-machine level: participants grows to include cover traffic.

This is a tiny code change on the client (just remove the “skip when message == None” early return in client::seal) plus a policy decision on how often a client should send cover. Stay-cheap- on-the-server was a first-class design goal of the paper; v2 makes it concrete.

Ratcheting for forward secrecy

v1: every round reruns HKDF-Extract from the same shared_secret. Compromise of the secret compromises all past pads.

v2: at the end of each round, both client and server ratchet:

shared_secret ← HKDF-Extract("zipnet/ratchet/v1", shared_secret);

Past shared secrets are unrecoverable from the new one under the PRF assumption. Both sides must step the ratchet in lockstep; the round number acts as the step counter. Committee members rederiving a missed step for a late-joining client catch up by evaluating the KDF round times.

For the client, the ratchet state sits in the TEE’s sealed storage (v2 TDX path). For the mock client, it sits in RAM — so a restart re-derives an independent key tree, which is fine.

Multi-tier aggregators

v1: single aggregator.

v2: arbitrary rooted tree of aggregators. Each leaf-level aggregator XOR-folds from its assigned clients, pushes up to its parent, parent folds and pushes to root, root publishes to the committee. Filtering uses require(|p| peer.tags().contains(&tag!("aggregator.tierN"))) and with_tags("aggregator.tierN+1") on online_when.

Each aggregator-to-aggregator link uses a dedicated stream (we already have the pattern in AggregateToServers). No state-machine change required because the root aggregator still emits one AggregateEnvelope per round.

Liveness resilience

v1: any committee server being offline halts round finalization — the state machine waits for len(partials) == len(header.servers).

v2 options:

  • Relaxed finalization. Finalize after t-of-n partials, where t is a configured threshold. A missing server’s pads are retroactively removed via a published “apology partial” submitted by any honest server that knows the remaining clients’ pads. (This requires publishing the missing server’s pad seeds under the committee’s shared secret, which defeats the point — so it needs MPC.)

  • Aggregator-sponsored timeout. The leader signals a timeout, bumps the RoundId, and opens a fresh round without the stuck server’s pads. This is simpler but loses the anonymity contribution of the absent honest server.

The first option is research-complete but not engineering-complete; the second option is trivial and is the candidate for v2.

TDX attestation in the critical path

v1: tee-tdx feature exists but the committee accepts any peer with a well-formed ClientBundle ticket (our BundleValidator only checks id/dh_pub consistency).

v2: on each committee admission path add .require_ticket(Tdx::new() .require_mrtd(expected_mrtd)) so only enclave-verified peers can participate. The expected MR_TD comes from the reproducible image build. ClientRegistry writes only land if the bundle’s PeerEntry also carries a valid TDX quote.

This is additive to the existing BundleValidator and stacks cleanly thanks to mosaik’s multi-require_ticket support.

State archival and snapshot sync

v1: CommitteeMachine.broadcasts grows unbounded in RAM; LogReplaySync is used for catch-up.

v2: implement a StateSync strategy that snapshots the last N broadcasts + the current InFlight and emits a blob. Externalize the archival of rotated broadcasts to a sink collection or a replicated object store.

Rate-limiting tags

v1: absent. A malicious client can flood envelopes.

v2: per the paper’s §3.1 sketch, each envelope carries PRF_k(ctr || epoch) where ctr is attested by the enclave. The aggregator dedupes by tag per epoch. This requires the TEE path to have landed first.

Scheduling vector equivocation protection

v1: a single leader publishes LiveRound into LiveRoundCell; divergent schedules would be detectable via the schedule_hash input to the KDF (if we included it — we pass NO_SCHEDULE in v1). Once footprint scheduling lands, every client must derive schedule_hash from the same broadcast schedule as the committee, or pads disagree and the broadcast is noise (correct failure mode per paper §3.2).

Versioning under stable instance names

v1: every incompatible change (any WIRE_VERSION or signature() bump) produces a new GroupId. Under the UNIVERSE + instance-salt design described in design-intro, this effectively makes the old instance a ghost and forces consumers to re-pin. If "acme.mainnet" is meant to be an operator-level identity that outlives schema changes, v1 cannot deliver it.

v2 must pick one of two reconciliation strategies, documented in design-intro — Versioning under stable instance names:

  • Version-in-name. acme.mainnet-v2 retires acme.mainnet. Clean, but forces a consumer-side release per bump.
  • Lockstep releases. The instance name stays stable across versions and operators + consumers cut matching releases against a shared deployment crate. Avoids id churn at the cost of tighter release-cadence coupling.

Neither is chosen yet. The call is forced the first time a v2 milestone above lands in a production deployment.

Cross-service composition

v1: zipnet is the only service we ship on zipnet::UNIVERSE.

v2: as sibling services (multisig signer, secure storage, attested oracles) land on the same universe, two concerns surface:

  • Catalog noise. Every peer on the universe appears in every agent’s discovery catalog. /mosaik/announce volume scales with the universe, not with the services an agent cares about. The escape hatch is the per-service derived private network for high-churn internal chatter; the residual cost is paid by everyone. If a service’s traffic would dominate the shared network, it belongs behind its own NetworkId — Shape A in design-intro — Two axes of choice — not on the shared one.
  • Cross-service atomicity. “Mix a zipnet message AND rotate a multisig signer” cannot be a single consensus transaction; they are different Groups, possibly with disjoint membership. If a coordination-heavy use case genuinely needs that, the answer is a fourth primitive that is itself a deployment providing atomic composition — not an ad-hoc cross-group protocol.

Optional directory collection (devops convenience)

Not a core feature. Zipnet’s consumer binding path is compile- time name reference plus mosaik peer discovery; no on-network registry is required, and the CLAUDE.md commitment is explicit that one will not be added. However, a shared Map<InstanceName, InstanceCard> listing known deployments may ship as a devops convenience for humans enumerating instances across operators. If built, it must:

  • be documented as a convenience, not a binding path;
  • be independently bindable — the SDK never consults it;
  • not become load-bearing for ACL or attestation decisions.

Flag-in-source as // CONVENIENCE: if it lands, to distinguish it from the // SIMPLIFICATION: v2-deferred markers.

Migration across these milestones

Each milestone above changes WIRE_VERSION or at minimum CommitteeMachine::signature(). Rolling between v1 and an arbitrary v2 milestone is therefore a coordinated “stop all nodes, start with new config” operation — same procedure as rotating the committee secret. We make no attempt at on-the-fly upgrade paths in this prototype.