Roadmap to v2
audience: contributors
These are the simplifications baked into v1 and the planned path to address each. The order here is not the implementation order — it is the order in which each change affects the external behavior of the system.
Footprint scheduling
v1: deterministic slot per (client, round) via keyed-blake3 mod
num_slots. Collision probability ≈ N / num_slots; at N = 8, num_slots = 64 that’s ~12%.
v2: the paper’s two-channel scheduling (§3.2). A side channel of
4 * N slots holds footprint reservations. Clients pick a random slot
and an f-bit random footprint each round, write the footprint into
the scheduling vector, and in round r+1 use the assigned message slot
only if their footprint round-tripped unchanged.
Implementation shape: add a second RoundParams::num_sched_slots and
a second broadcast vector, run the same HKDF-AES pad derivation against
a distinct label "zipnet/pad/sched/v1". The CommitteeMachine
consumes two aggregates per round (message + schedule) and splits the
final broadcast into two halves. WIRE_VERSION bump: 1 → 2.
Cover traffic
v1: non-talking clients omit their envelope entirely. This narrows the anonymity set to active talkers.
v2: clients with no message produce a pure-pad envelope (msg_i = 0,
all pads XORed in). The aggregator and committee process these
indistinguishably from talker envelopes. The only visible change at the
state-machine level: participants grows to include cover traffic.
This is a tiny code change on the client (just remove the
“skip when message == None” early return in client::seal) plus a
policy decision on how often a client should send cover. Stay-cheap-
on-the-server was a first-class design goal of the paper; v2 makes it
concrete.
Ratcheting for forward secrecy
v1: every round reruns HKDF-Extract from the same shared_secret.
Compromise of the secret compromises all past pads.
v2: at the end of each round, both client and server ratchet:
shared_secret ← HKDF-Extract("zipnet/ratchet/v1", shared_secret);
Past shared secrets are unrecoverable from the new one under the PRF
assumption. Both sides must step the ratchet in lockstep; the round
number acts as the step counter. Committee members rederiving a missed
step for a late-joining client catch up by evaluating the KDF round
times.
For the client, the ratchet state sits in the TEE’s sealed storage (v2 TDX path). For the mock client, it sits in RAM — so a restart re-derives an independent key tree, which is fine.
Multi-tier aggregators
v1: single aggregator.
v2: arbitrary rooted tree of aggregators. Each leaf-level aggregator
XOR-folds from its assigned clients, pushes up to its parent, parent
folds and pushes to root, root publishes to the committee. Filtering
uses require(|p| peer.tags().contains(&tag!("aggregator.tierN"))) and
with_tags("aggregator.tierN+1") on online_when.
Each aggregator-to-aggregator link uses a dedicated stream (we already
have the pattern in AggregateToServers). No state-machine change
required because the root aggregator still emits one AggregateEnvelope
per round.
Liveness resilience
v1: any committee server being offline halts round finalization —
the state machine waits for len(partials) == len(header.servers).
v2 options:
-
Relaxed finalization. Finalize after
t-of-npartials, wheretis a configured threshold. A missing server’s pads are retroactively removed via a published “apology partial” submitted by any honest server that knows the remaining clients’ pads. (This requires publishing the missing server’s pad seeds under the committee’s shared secret, which defeats the point — so it needs MPC.) -
Aggregator-sponsored timeout. The leader signals a timeout, bumps the
RoundId, and opens a fresh round without the stuck server’s pads. This is simpler but loses the anonymity contribution of the absent honest server.
The first option is research-complete but not engineering-complete; the second option is trivial and is the candidate for v2.
TDX attestation in the critical path
v1: tee-tdx feature exists but the committee accepts any peer
with a well-formed ClientBundle ticket (our BundleValidator only
checks id/dh_pub consistency).
v2: on each committee admission path add .require_ticket(Tdx::new() .require_mrtd(expected_mrtd)) so only enclave-verified peers can
participate. The expected MR_TD comes from the reproducible image build.
ClientRegistry writes only land if the bundle’s PeerEntry also
carries a valid TDX quote.
This is additive to the existing BundleValidator and stacks cleanly
thanks to mosaik’s multi-require_ticket support.
State archival and snapshot sync
v1: CommitteeMachine.broadcasts grows unbounded in RAM;
LogReplaySync is used for catch-up.
v2: implement a StateSync strategy that snapshots the last N
broadcasts + the current InFlight and emits a blob. Externalize the
archival of rotated broadcasts to a sink collection or a replicated
object store.
Rate-limiting tags
v1: absent. A malicious client can flood envelopes.
v2: per the paper’s §3.1 sketch, each envelope carries
PRF_k(ctr || epoch) where ctr is attested by the enclave. The
aggregator dedupes by tag per epoch. This requires the TEE path to have
landed first.
Scheduling vector equivocation protection
v1: a single leader publishes LiveRound into LiveRoundCell;
divergent schedules would be detectable via the schedule_hash input
to the KDF (if we included it — we pass NO_SCHEDULE in v1). Once
footprint scheduling lands, every client must derive schedule_hash
from the same broadcast schedule as the committee, or pads disagree and
the broadcast is noise (correct failure mode per paper §3.2).
Versioning under stable instance names
v1: every incompatible change (any WIRE_VERSION or
signature() bump) produces a new GroupId. Under the UNIVERSE +
instance-salt design described in
design-intro,
this effectively makes the old instance a ghost and forces consumers
to re-pin. If "acme.mainnet" is meant to be an operator-level
identity that outlives schema changes, v1 cannot deliver it.
v2 must pick one of two reconciliation strategies, documented in design-intro — Versioning under stable instance names:
- Version-in-name.
acme.mainnet-v2retiresacme.mainnet. Clean, but forces a consumer-side release per bump. - Lockstep releases. The instance name stays stable across versions and operators + consumers cut matching releases against a shared deployment crate. Avoids id churn at the cost of tighter release-cadence coupling.
Neither is chosen yet. The call is forced the first time a v2 milestone above lands in a production deployment.
Cross-service composition
v1: zipnet is the only service we ship on zipnet::UNIVERSE.
v2: as sibling services (multisig signer, secure storage, attested oracles) land on the same universe, two concerns surface:
- Catalog noise. Every peer on the universe appears in every
agent’s discovery catalog.
/mosaik/announcevolume scales with the universe, not with the services an agent cares about. The escape hatch is the per-service derived private network for high-churn internal chatter; the residual cost is paid by everyone. If a service’s traffic would dominate the shared network, it belongs behind its ownNetworkId— Shape A in design-intro — Two axes of choice — not on the shared one. - Cross-service atomicity. “Mix a zipnet message AND rotate a
multisig signer” cannot be a single consensus transaction; they are
different
Groups, possibly with disjoint membership. If a coordination-heavy use case genuinely needs that, the answer is a fourth primitive that is itself a deployment providing atomic composition — not an ad-hoc cross-group protocol.
Optional directory collection (devops convenience)
Not a core feature. Zipnet’s consumer binding path is compile-
time name reference plus mosaik peer discovery; no on-network
registry is required, and the
CLAUDE.md commitment is explicit that one will
not be added. However, a shared Map<InstanceName, InstanceCard>
listing known deployments may ship as a devops convenience for
humans enumerating instances across operators. If built, it must:
- be documented as a convenience, not a binding path;
- be independently bindable — the SDK never consults it;
- not become load-bearing for ACL or attestation decisions.
Flag-in-source as // CONVENIENCE: if it lands, to distinguish it
from the // SIMPLIFICATION: v2-deferred markers.
Migration across these milestones
Each milestone above changes WIRE_VERSION or at minimum
CommitteeMachine::signature(). Rolling between v1 and an arbitrary
v2 milestone is therefore a coordinated “stop all nodes, start with new
config” operation — same procedure as
rotating the committee secret.
We make no attempt at on-the-fly upgrade paths in this prototype.