Introduction
audience: all
Zipnet is an anonymous broadcast channel for bounded sets of authenticated participants. A group of clients publish messages onto a shared log; nobody — not even the operators of the infrastructure, acting individually — can tell which client authored which message.
This book documents a working prototype of ZIPNet built as a mosaik-native application. The protocol follows Rosenberg, Shih, Zhao, Wang, Miers, and Zhang (2024) with a small, grep-able set of v1 simplifications tracked in Roadmap to v2.
What zipnet is for
The canonical motivating case is an encrypted mempool: TEE-attested wallets seal transactions and publish them through zipnet; builders read an ordered log of sealed transactions; no party — not even a compromised builder — can link a transaction back to its author until on-chain execution reveals whatever the transaction itself reveals. The encryption layer (threshold decryption, TEE unsealing, plaintext-if-you-want) sits on top; zipnet supplies the anonymous, ordered, sybil-resistant publish channel underneath.
Other deployments in the same shape:
- Permissioned order-flow auctions. Whitelisted searchers publish intents; builders bid without knowing which searcher sent what.
- Anonymous governance signalling. Token-holder wallets cast signals a delegate can tally without learning which wallet sent any given one.
- Private sealed-bid auctions. Bidders publish; outcomes are public; bid-to-bidder linkage is cryptographic.
What zipnet uniquely provides across these:
- Sender anonymity within an attested set. A compromised reader cannot tie a message back to its author unless every committee operator colludes (any-trust).
- Shared ordered view. Every subscriber sees the same log in the same order.
- Sybil resistance. Only TEE-attested clients can publish.
- Censorship resistance at the publish layer. Readers cannot drop messages from specific authors because authorship is unlinkable.
The deployment model in one paragraph
Zipnet runs as one service among many on a shared mosaik universe
— a single NetworkId (zipnet::UNIVERSE) that hosts zipnet
alongside other mosaik services (signers, storage, oracles). An
operator stands up an instance under a short, namespaced string
(e.g. acme.mainnet); multiple instances coexist on the same
universe, each with its own committee, ACL, and round parameters.
Consumers bind to an instance by name with one line of Rust:
Zipnet::bind(&network, "acme.mainnet"). There is no on-network
registry; the operator publishes the instance name (and, if TDX-gated,
the committee MR_TD) via release notes or docs, and consumers compile
it in.
The full rationale is in Designing coexisting systems on mosaik.
Three audiences, three entry points
This book is written for three distinct readers. Every page declares its audience on the first line and respects that audience’s tone. Pick the one that matches you:
- Users — Rust developers building agents that publish into, or read from, a zipnet instance somebody else operates. Start at Quickstart — publish and read.
- Operators — devops staff deploying and maintaining instances. Not expected to read Rust. Start at Deployment overview then Quickstart — stand up an instance.
- Contributors — senior Rust engineers with distsys and crypto background, extending the protocol or the code. Start at Designing coexisting systems on mosaik then Architecture.
See Who this book is for for the tone conventions each audience is held to.
What this prototype is
- A permissioned, any-trust broadcast system: anonymity is preserved as long as at least one committee server is honest; liveness requires every committee server to be honest (in v1).
- Real cryptography — X25519 Diffie–Hellman, HKDF-SHA256, AES-128-CTR pad generation, blake3 falsification tags, ed25519 peer signatures (via iroh).
- Real consensus — the committee runs a modified Raft through
mosaik’s
Group<CommitteeMachine>. - Real networking — the aggregator and the committee communicate through mosaik typed streams; discovery is gossip + pkarr + mDNS; transport is iroh / QUIC.
What this prototype is not
- A production anonymous broadcast system. Ratcheting, footprint scheduling, cover traffic, multi-tier aggregators, and TDX-only builds tracked in the Roadmap to v2 are all deferred.
- Byzantine fault tolerant. Mosaik is explicit about this; zipnet inherits the assumption. See Threat model for the precise statement.
Layout of the source tree
crates/
zipnet SDK facade (Zipnet::bind, UNIVERSE, instance_id!)
zipnet-proto wire types, crypto, XOR
zipnet-core Algorithms 1/2/3 as pure functions
zipnet-node mosaik integration
zipnet-client TEE client binary
zipnet-aggregator aggregator binary
zipnet-server committee server binary
book/ this book
See Crate map for the dependency graph and purity boundaries.
Who this book is for
audience: all
The zipnet book has three audiences. Every chapter declares its
audience on the first line (audience: users | operators | contributors | both | all) and respects that audience’s conventions.
This page is the authoritative description of each audience and the
tone we hold ourselves to. New pages must pick one.
Mixing audiences wastes readers’ time and erodes trust. When content
genuinely serves more than one group, use both (users + operators,
users + contributors, …) or all, and structure the page so each
audience gets the answer it came for in the first paragraph.
Users
Who they are. External Rust developers building their own mosaik agents that publish into — or read from — a running zipnet instance. They do not run committee servers or the aggregator; that is the operator’s job. They are integrators, not protocol implementers.
What they can assume.
- Comfortable with async Rust and the mosaik book.
- Already have a mosaik application in mind; zipnet is a dependency, not the centre of their work.
- They bring their own
Arc<Network>and own its lifecycle.
What they do not need.
- Protocol theory. A user who wants it can follow the link to the contributor pages.
- An explanation of mosaik primitives. Link the mosaik book instead.
- A committee operator’s view of keys, rotations, or monitoring.
What they care about.
- “What do I import?”
- “How do I bind to the operator’s instance?”
- “What does the operator owe me out of band — universe, instance name, MR_TD?”
- “What does an error actually mean when it fires?”
Tone. Code-forward and cookbook-style. Snippets are
rust,ignore, self-contained, and meant to be lifted into the
reader’s workspace. Public API surfaces are listed as tables. Common
pitfalls are called out inline so the reader does not have to infer
them from silence. Second person (“you”) throughout.
Canonical user page. Quickstart — publish and read.
Operators
Who they are. Devops staff deploying and maintaining zipnet instances. They run the committee, the aggregator, and the TDX images. They are the ones the users rely on.
What they can assume.
- Familiar with Linux ops, systemd units, cloud networking, TLS, Prometheus.
- Comfortable reading logs and dashboards.
- Not expected to read Rust source. A Rust or protocol detail that is load-bearing for an operational decision belongs in a clearly marked “dev note” aside that can be skipped.
What they do not need.
- The paper. Link it when a term is inherited; do not re-derive.
- Internal crate layering. The operator cares what a binary does, not which crate it lives in.
- Client-side ergonomics. That is the users’ book.
What they care about.
- “What do I run, on what hardware, with what env vars?”
- “How do I know it is healthy?”
- “How do I rotate secrets / retire an instance / upgrade an image?”
- “What page covers the alert that just fired?”
Tone. Calm, runbook-style. Numbered procedures, parameter tables, one-line shell snippets. Pre-empt the obvious “what if…” questions inline. Avoid “simply” and “just”. Every command should either be safe to run verbatim or clearly marked as needing adaptation.
Canonical operator page. Quickstart — stand up an instance.
Contributors
Who they are. Senior Rust engineers with distributed-systems and cryptography background, extending the protocol or the code, or standing up a new service on mosaik that reuses zipnet’s deployment pattern.
What they can assume.
- Have read the ZIPNet paper (eprint 2024/1227).
- Have read the mosaik book and are
comfortable with
Stream,Group,Collection,TicketValidator, thewhen()DSL,declare!macros. - Comfortable with async Rust, Raft, DC nets.
What they do not need.
- Re-exposition of the paper. Cite section numbers (e.g. “§3.2”) and move on.
- Primitives covered in the mosaik book. Link it.
- User-level ergonomics unless they drive a design choice.
What they care about.
- “Why is it this shape and not Shape A / B / C / D?”
- “What invariants must hold? Where are they enforced?”
- “What breaks when I bump
StateMachine::signature()?” - “Where do I extend this — which module, which trait, which test?”
Tone. Dense, precise, design-review style. ASCII diagrams,
pseudocode, rationale. rust,ignore snippets and structural
comparisons without apology.
Canonical contributor page. Designing coexisting systems on mosaik.
Shared writing rules
- No emojis anywhere in the book or the code.
- No exclamation marks outside explicit security warnings.
- Link the paper by section number when inheriting its terminology (e.g. “§3.2 scheduling”), not by paraphrase.
- Link the mosaik book rather than re-explaining mosaik primitives. Our readers can follow a link.
- Security-relevant facts are tagged with a visible admonition, not hidden inline.
- Keep the three quickstarts synchronised. When the public SDK shape, the deployment model, or the naming convention changes, update the users, operators, and contributors quickstarts together, not “this one first, the others later”.
What you need from the operator
audience: users
Before you can write a line of code against a running zipnet deployment, collect two (or three, if it is TDX-gated) items from whoever runs it. That is the whole handshake — zipnet does not gossip an instance registry, so everything you need to reach the deployment has to arrive out of band.
The handshake
| # | Item | What it is | Where it goes in your code |
|---|---|---|---|
| 1 | Instance name | Short namespaced string that names the deployment. Examples: acme.mainnet, preview.alpha, dev.ci-42. | Zipnet::bind(&network, "acme.mainnet") |
| 2 | Bootstrap PeerId | At least one reachable peer on the shared universe — typically the operator’s aggregator or a committee server. Without one, cold-start discovery falls back to the Mainline DHT and takes minutes instead of seconds. | discovery::Config::builder().with_bootstrap(peer_id) on the Network builder. |
| 3 | Committee MR_TD (TDX-gated deployments only) | 48-byte hex measurement of the operator’s committee image. Pin this if your agent verifies inbound committee attestation, or match it if you are building a client image. | See TEE-gated deployments for which applies to your setup. |
The instance name is the one thing that differs between deployments.
It fully determines every on-wire ID the SDK uses — committee
GroupId, submit StreamId, broadcasts StoreId, ticket class — via
a single blake3("zipnet." + instance_name) derivation. If your
string disagrees with the operator’s by one character, your code
derives IDs nobody is serving, and Zipnet::bind returns
Error::ConnectTimeout after the bond window elapses.
The bootstrap peer is universe-level, not zipnet-specific. Any reachable peer on the shared universe is a valid starting point; once you are bonded, mosaik’s discovery finds the specific instance’s committee and aggregator through the shared peer catalog.
The MR_TD is relevant only if the operator has turned on TDX gating. Most development deployments do not; production often does.
What you do not need to ask for
- The universe
NetworkId. It iszipnet::UNIVERSE— a shared constant baked into the SDK. Every operator and every user on zipnet uses the same value. You only need an operator-supplied override in the rare case they run an isolated federation on a different universe; assume they will tell you explicitly if so. - Per-instance
StreamId/StoreId/GroupIdvalues. The SDK derives all of them from the instance name. Operators never hand these out, and the facade does not accept them. - Committee server secrets or any committee member’s X25519 secret. You are a consumer, not a committee member.
- A seat on the committee’s Raft group. The SDK reads the broadcast log through a replicated collection; it does not vote.
How the handshake travels
Out of band. Release notes, a README in the operator’s repo, a Slack message, a secret-manager entry. Zipnet deliberately does not carry an on-network registry — the shared-universe model assumes consumers compile-time reference the instance name they trust, rather than discovering “what instances exist” at runtime. See Designing coexisting systems on mosaik for the rationale.
Pinning the instance name at compile time
A typo in the instance name silently produces a different UniqueId
and surfaces as ConnectTimeout. For production code, bake the name
in with the instance_id! macro so typos become build errors:
use zipnet::{Zipnet, UniqueId, UNIVERSE};
const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");
let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;
instance_id!("acme.mainnet") and zipnet::instance_id("acme.mainnet")
produce identical bytes, so an operator’s ZIPNET_INSTANCE=acme.mainnet
env var and your compile-time constant land on the same UniqueId.
What you bring yourself
- Your mosaik
SecretKeyif you want a stablePeerIdacross restarts. Leave it unset to get a random identity per run, which is the usual choice for anonymous-use-case clients. See Identity. - Your message payloads. The SDK does not care what bytes you put
in — any
impl Into<Vec<u8>>.
Minimal smoke test before writing anything substantial
Once you have the two items (three if TDX-gated), this program publishes to the deployment and prints a receipt within a few round periods:
use std::sync::Arc;
use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let bootstrap = "<paste-the-operator's-peer-id>".parse()?;
let network = Arc::new(
Network::builder(UNIVERSE)
.with_discovery(discovery::Config::builder().with_bootstrap(bootstrap))
.build()
.await?,
);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let receipt = zipnet.publish(b"hello from my laptop").await?;
println!("landed in round {} slot {}", receipt.round, receipt.slot);
Ok(())
}
If bind returns ConnectTimeout, the instance name or the bootstrap
peer is the first suspect — see Troubleshooting.
Trust
The operator is trusted for liveness — they can stall or kill rounds at will. They are not trusted for anonymity, provided the any-trust assumption holds across their committee. See Threat model if you are auditing before integrating.
Quickstart — publish and read
audience: users
You bring a mosaik::Network; the SDK layers ZIPNet on top of it as
one service among many on a shared mosaik universe. Every deployment
is identified by an instance name. You bind to the one you want
with Zipnet::bind(&network, instance_name).
Why you might want this
You’re building something where a bounded, authenticated set of participants needs to publish messages without revealing which participant sent which. The canonical case is an encrypted mempool: TDX-attested wallets seal transactions and publish them through zipnet; builders read an ordered broadcast log of sealed transactions; nobody — not even a compromised builder — can link a transaction to its sender until on-chain execution reveals whatever the transaction itself reveals. The encryption layer (threshold decryption, TEE unsealing, or none) sits on top; zipnet supplies the anonymous, ordered, sybil-resistant publish channel underneath.
Other deployments in the same shape:
- Permissioned order-flow auctions. Whitelisted searchers publish intents; builders bid without knowing which searcher sent what.
- Anonymous governance signalling. Token-holder wallets cast signals a delegate can tally without learning which wallet sent any given one.
- Private sealed-bid auctions. Bidders publish; outcome is public; bid-to-bidder linkage is cryptographic.
What zipnet uniquely provides across these:
- Sender anonymity within an attested set. A compromised reader cannot tie a message back to its author unless every committee operator colludes (any-trust).
- Shared ordered view. Every subscriber sees the same log in the same order. No relay-race asymmetry between readers.
- Sybil resistance. Only TDX-attested clients can publish.
- Censorship resistance at the publish layer. Readers can’t drop messages from specific authors because authorship is unlinkable.
If you’re the operator standing up the deployment rather than using one, read the operator quickstart instead.
The one-paragraph mental model
A mosaik universe is a single shared NetworkId. Many services — zipnet,
multisig signers, secure storage, oracles — live on it simultaneously.
An operator can run any number of instances of zipnet (“mainnet”,
“preview.alpha”, “acme-corp”) concurrently on the same universe; each
instance has its own committee, its own ACL, its own round parameters,
and its own ticket class. You pick the one you want by name — the
operator tells you which name to use, and your code bakes it in. No
registry lookup, no runtime discovery of “what instances exist”. The
same Arc<Network> handle can also bind to other services without
needing a second network.
Cargo.toml
[dependencies]
zipnet = "0.1"
mosaik = "=0.3.17"
tokio = { version = "1", features = ["full"] }
futures = "0.3"
anyhow = "1"
zipnet re-exports mosaik::{Tag, unique_id!} so you rarely reach
for mosaik directly in small agents, but you’ll usually keep mosaik
as a direct dep since you’re the one owning the Network.
Publisher
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "mainnet").await?;
let receipt = zipnet.publish(b"hello from my agent").await?;
println!("landed in round {} slot {}", receipt.round, receipt.slot);
Ok(())
}
Three lines inside main:
- Create a mosaik network on the shared universe
NetworkId. - Bind to the
mainnetzipnet instance. The SDK resolves the instance salt to concrete stream, collection, and group IDs, installs the client identity, attaches the bundle ticket, and waits until you are in a live round’s roster. publishresolves after the broadcast finalizes.
UNIVERSE is the shared NetworkId that hosts the deployment. Zipnet
exports this constant today; when mosaik ships a canonical universe
constant, this value will be re-exported verbatim. See
Designing coexisting systems on mosaik
for the full rationale.
Subscriber
use std::sync::Arc;
use futures::StreamExt;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "mainnet").await?;
let mut rounds = zipnet.subscribe().await?;
while let Some(round) = rounds.next().await {
for msg in round.messages() {
println!("round {}: {:?}", round.id(), msg.bytes());
}
}
Ok(())
}
round.messages() yields only payloads that decoded cleanly —
falsification-tag verification and collision filtering happen inside
the SDK. Reach for round.raw() if you need the BroadcastRecord.
Binding to a testnet, devnet, or tenant instance
Instance names are free-form strings; well-known names are
conventions, not types. An operator running a testnet gives you its
instance name (e.g. preview.alpha) along with the universe-level
bootstrap peers and any required TDX measurement.
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "preview.alpha").await?;
let _ = zipnet.publish(b"hi from testnet").await?;
Ok(())
}
| Instance name | What operators commonly use it for |
|---|---|
mainnet | Production deployment, long-lived committee |
preview.<tag> | Long-lived testnet on a per-tag TDX image |
dev.<tag> | Per-developer or per-CI-job ephemeral instance |
| anything else | Whatever the operator tells you |
The SDK itself does not dispatch on the name — TDX attestation is
controlled by the tee-tdx Cargo feature on the zipnet crate, not
by the instance name you pick. The table above is naming convention,
not policy.
The instance name is the only piece of zipnet-specific identity the
SDK needs. It fully determines the committee GroupId, the submit
StreamId, the broadcasts StoreId, and the ticket class — all
derived from one salt (see
Designing coexisting systems on mosaik).
A typo in the instance name is silent — your code derives different
IDs than the operator, no one picks up, and bind returns
ConnectTimeout after the bond window elapses. For production,
consider pinning the instance as a compile-time UniqueId constant
using the instance_id! macro, so a typo
is caught at build time:
use zipnet::{Zipnet, UniqueId, UNIVERSE};
const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");
let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;
The instance_id! macro and the runtime instance_id function
produce identical bytes for the same name, so the operator’s
ZIPNET_INSTANCE=acme.mainnet env var and your compile-time constant
land on the same UniqueId.
Sharing one Network across services and instances
Because Zipnet::bind only takes &Arc<Network>, one network handle
can simultaneously serve many services and many instances of the same
service:
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
// two zipnet instances side by side
let prod = Zipnet::bind(&network, "mainnet").await?;
let testnet = Zipnet::bind(&network, "preview.alpha").await?;
// …and unrelated services on the same network
// let multisig = Multisig::bind(&network, "treasury").await?;
// let storage = Storage::bind(&network, "archive").await?;
let _ = prod.publish(b"production message").await?;
let _ = testnet.publish(b"dry-run message").await?;
Ok(())
}
Every instance and every service derives its own IDs from its own salt, so they coexist on the shared catalog without collision. You pay for one mosaik endpoint, one DHT record, one gossip loop — not one per service.
Bring-your-own-config
You keep full control of the mosaik builder; the SDK never constructs
the Network for you:
use std::{net::SocketAddr, sync::Arc};
use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(
Network::builder(UNIVERSE)
.with_mdns_discovery(true)
.with_discovery(discovery::Config::builder().with_bootstrap(universe_bootstrap_peers()))
.with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
.build()
.await?,
);
let zipnet = Zipnet::bind(&network, "mainnet").await?;
let _ = zipnet.publish(b"hi").await?;
Ok(())
}
fn universe_bootstrap_peers() -> Vec<mosaik::PeerId> { vec![] }
Bootstrap peers are universe-level, not zipnet-specific. Any
reachable peer on the shared network — a mosaik registry node, a
friendly operator’s aggregator, your own relay — works as a starting
point. Once you’re bonded, Zipnet::bind locates the specific
instance’s committee and aggregator via the shared peer catalog.
What you get back
pub struct Receipt {
pub round: zipnet::RoundId,
pub slot: usize,
pub outcome: zipnet::Outcome,
}
pub enum Outcome { Landed, Collided, Dropped }
pub struct Round { /* opaque */ }
impl Round {
pub fn id(&self) -> zipnet::RoundId;
pub fn messages(&self) -> impl Iterator<Item = zipnet::Message>;
pub fn raw(&self) -> &zipnet::BroadcastRecord;
}
pub struct Message { /* opaque */ }
impl Message {
pub fn bytes(&self) -> &[u8];
pub fn slot(&self) -> usize;
}
Almost every application uses Receipt::outcome and
Message::bytes() and ignores the rest.
Error model
pub enum Error {
WrongUniverse { expected: mosaik::NetworkId, actual: mosaik::NetworkId },
ConnectTimeout,
Attestation(String),
Shutdown,
Protocol(String),
}
ConnectTimeout is the one you’ll hit in development — usually a
typo in the instance name (you’re deriving a GroupId nobody is
serving), an unreachable bootstrap peer, or an operator whose
committee isn’t up yet. WrongUniverse shows up if your Network
was built against a different universe NetworkId than the SDK
expects.
Cover traffic is on by default
An idle Zipnet handle sends a cover envelope each round to widen
the anonymity set. See
Publishing messages for
how to tune or disable it.
Shutdown
drop(zipnet); // fine — the driver task exits cleanly
zipnet.shutdown().await?; // if you want to flush pending publishes first
Dropping one Zipnet handle only shuts that binding down; the
Network stays up as long as other handles (or you) hold it. This is
the intended pattern when one process talks to several services or
several instances.
Next reading
- What you need from the operator — the fact sheet the operator gives you before writing code.
- Publishing messages — fire-and-forget, cover traffic, retry policy.
- Reading the broadcast log — replay, gap detection, filtering.
- Client identity and registration — stable vs
ephemeral
ClientId. - TEE-gated deployments — TDX builds, measurement rollouts.
- Designing coexisting systems on mosaik — the shared-universe / instance-name model in full.
- API reference — full type list.
Client identity
audience: users
A zipnet client has two distinct identities that work together. The SDK
manages one of them for you; the other you control through the mosaik
Network you hand to Zipnet::bind.
Two identities
| Identity | Type | Where it comes from | Purpose |
|---|---|---|---|
PeerId | ed25519 public key | mosaik / iroh SecretKey on the Network | Authenticates you on the wire. Signs your PeerEntry. |
| Client-side DH identity | X25519 keypair | Generated inside Zipnet::bind per binding | Names your slot in the anonymous-broadcast rounds. Binds your pads. |
The DH identity is internal to zipnet and not exposed across the SDK
surface — you never see a ClientId or DhSecret type in user code.
Every call to Zipnet::bind generates a fresh DH keypair, installs
the matching bundle ticket through mosaik’s discovery layer, and waits
until the committee admits the binding into a live round. When you
drop the Zipnet handle, that keypair and its ticket go with it.
Your PeerId is the only identity you materially choose.
Choose your PeerId lifetime
Fully ephemeral (default)
Build the Network without calling with_secret_key. Mosaik picks a
random iroh identity per run. Combined with the per-bind DH
identity, this means every process run is an unlinkable
(PeerId, client-DH) pair.
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
This is the right default for anonymous use cases. An observer
correlating PeerIds across rounds learns only “this peer was online
during this interval” — which is what mosaik’s transport layer exposes
anyway, independent of zipnet.
Stable PeerId, ephemeral client DH identity
Useful when you want a predictable bootstrap target (your agent’s
PeerId stays the same across restarts) but you don’t want to be
correlatable inside zipnet rounds. Each bind gets a fresh DH keypair
regardless of the PeerId.
use std::sync::Arc;
use mosaik::{Network, SecretKey};
use zipnet::{Zipnet, UNIVERSE};
let sk = SecretKey::from_bytes(&my_seed_bytes);
let network = Arc::new(
Network::builder(UNIVERSE)
.with_secret_key(sk)
.build()
.await?,
);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
The rebind produces a new client DH identity even with the same
PeerId, so rounds stay unlinkable at the zipnet layer. If you hold
one Zipnet handle for a long time and publish many messages, those
messages share one client DH identity and are linkable to each
other. To rotate, drop the handle and call bind again.
Stable everything (rare)
The current SDK does not expose a way to persist the per-binding DH identity across restarts. If you need stable client identity for a reputation or allowlist use case, talk to the operator about attested-client TDX features — see TEE-gated deployments. Stable anonymous-publish identity at the application layer is an anti-pattern: it trivially breaks unlinkability across rounds.
Multiple bindings per process
Zipnet::bind only borrows the Arc<Network>, so one network can
host many bindings — the same instance many times, different instances
side by side, or zipnet alongside other mosaik services:
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
let network = Arc::new(Network::new(UNIVERSE).await?);
let prod = Zipnet::bind(&network, "acme.mainnet").await?;
let testnet = Zipnet::bind(&network, "preview.alpha").await?;
let prod_bis = Zipnet::bind(&network, "acme.mainnet").await?;
prod and prod_bis have the same PeerId but independent
client DH identities; the committee treats them as two distinct
publishers. This is occasionally useful for widening your own
anonymity set in test deployments, but it does not buy you extra
anonymity in production against a global observer watching your
network interface.
Rotating
Drop the Zipnet handle and call bind again:
drop(prod);
let prod = Zipnet::bind(&network, "acme.mainnet").await?;
drop tears down the driver task, removes the bundle ticket from
discovery, and lets the committee’s roster forget the old DH
identity at the next gossip cycle. The next bind starts clean.
If you want to flush pending publishes before dropping, prefer
zipnet.shutdown().await? — see Publishing.
What about the peer catalog?
The mosaik peer catalog — network.discovery().catalog() — lists
every peer zipnet and anything else on the shared universe sees. It
is not zipnet-specific, and the SDK does not ask you to interact with
it. If you need to inspect it for debugging, see the
mosaik book on discovery.
Publishing messages
audience: users
Everything about getting a payload into a finalized broadcast round.
The whole surface
impl Zipnet {
pub async fn publish(&self, payload: impl Into<Vec<u8>>) -> Result<Receipt>;
}
pub struct Receipt {
pub round: zipnet::RoundId,
pub slot: usize,
pub outcome: zipnet::Outcome,
}
pub enum Outcome { Landed, Collided, Dropped }
One call per message. publish resolves after the round carrying the
payload finalizes — not when the envelope was accepted by the
aggregator. The Receipt tells you what actually happened.
Fire-and-forget
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let _ = zipnet.publish(b"hello").await?;
Ok(())
}
If you don’t care whether the message landed, colliding, or was
dropped, discard the Receipt. Many applications do exactly this —
the encryption or ordering layer built on top replays lost messages
at the application level.
Inspecting the outcome
use zipnet::Outcome;
let receipt = zipnet.publish(payload).await?;
match receipt.outcome {
Outcome::Landed => tracing::info!(round = %receipt.round, "published"),
Outcome::Collided => {
// Another client hashed to the same slot this round. Both
// payloads are XOR-corrupted. Retry on the next round.
tracing::warn!(round = %receipt.round, "collision, retrying");
// … call publish again with the same payload …
}
Outcome::Dropped => {
// The aggregator never forwarded the envelope into a
// committed aggregate. Usually transient — the aggregator
// was offline or our registration hadn't propagated yet.
tracing::warn!(round = %receipt.round, "dropped, retrying");
}
}
Landed is the happy path. Under default parameters and a small
active set, most rounds produce Landed for everyone.
Retry policy
Zipnet does not retry for you. If you need at-least-once delivery
at the application layer, wrap publish in your own loop:
use zipnet::{Outcome, Zipnet};
async fn publish_with_retry(z: &Zipnet, payload: Vec<u8>, attempts: u32)
-> zipnet::Result<zipnet::Receipt>
{
for _ in 0..attempts {
let receipt = z.publish(payload.clone()).await?;
if matches!(receipt.outcome, Outcome::Landed) {
return Ok(receipt);
}
}
// Return the last receipt (probably Collided/Dropped), or a
// protocol-level error if you'd rather surface that instead.
z.publish(payload).await
}
Retry latency is bounded by the round cadence of the deployment — at
the default ~2 s round period, three attempts cost up to ~6 s. Tune
attempts to your SLA.
Payload budget
The SDK accepts impl Into<Vec<u8>>. Internally, a payload that
exceeds the round’s per-slot budget is rejected; payloads that fit
are zero-padded into their slot. Current default round parameters
give you 240 bytes of application payload per publish. For
larger messages, split at the application layer — the protocol does
not frame for you.
If your deployment uses non-default parameters, the operator will
tell you the budget. The SDK surfaces oversized payloads as
Error::Protocol("payload too large: …"); see
Troubleshooting.
Cover traffic is on by default
An idle Zipnet handle sends a cover envelope each round to widen
the anonymity set. Cover envelopes do not show up as publish calls
on your side — they are generated automatically by the binding’s
driver task. Observers cannot distinguish a cover round from a
real-payload round for any given participant.
There is no SDK knob to tune cover-traffic rate today. If you hold a
Zipnet handle, you participate; if you drop it, you don’t. For
applications that want to only appear for certain rounds, bind
immediately before you need to publish and drop immediately after —
see Identity.
Parallel publishes on one handle
Zipnet is Clone and internally Arc-wrapped. Concurrent
publish calls on one handle are fine; the driver serializes them
per-round and emits at most one payload per round per binding. If
you call publish twice during the same round window, the second
call waits for the next round rather than sharing the slot.
let z = Zipnet::bind(&network, "acme.mainnet").await?;
let a = z.clone();
let b = z.clone();
let (ra, rb) = tokio::join!(
a.publish(b"message A"),
b.publish(b"message B"),
);
// ra and rb come from different rounds.
If you need higher throughput per wall-clock second, the right lever
is operator-side round cadence or num_slots. From the SDK, one
binding is one slot per round.
Shutdown
drop(zipnet); // fine — the driver exits cleanly, in-flight publishes may be lost
zipnet.shutdown().await?; // waits for in-flight receipts, then tears down
shutdown returns Error::Shutdown if the binding was already
closing. Otherwise the call resolves once pending publishes have
either landed or been marked Dropped. Use it in application-level
shutdown paths where losing a trailing publish would be surprising.
Dropping one Zipnet handle does not tear down the Network.
Other services or other zipnet instances sharing the same
Arc<Network> keep running.
Reading the broadcast log
audience: users
Zipnet::subscribe returns a stream of finalized rounds. Every
subscriber sees the same log in the same order.
The whole surface
impl Zipnet {
pub async fn subscribe(&self) -> Result<BroadcastStream>;
}
// BroadcastStream implements futures::Stream<Item = Round>.
pub struct Round { /* opaque */ }
impl Round {
pub fn id(&self) -> zipnet::RoundId;
pub fn messages(&self) -> impl Iterator<Item = Message>;
pub fn raw(&self) -> &zipnet::BroadcastRecord;
}
pub struct Message { /* opaque */ }
impl Message {
pub fn bytes(&self) -> &[u8];
pub fn slot(&self) -> usize;
}
One call per subscriber. Every call to subscribe returns a fresh
receiver; handles are cheap.
Tail the log as it grows
use std::sync::Arc;
use futures::StreamExt;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let mut rounds = zipnet.subscribe().await?;
while let Some(round) = rounds.next().await {
for msg in round.messages() {
println!("round {}: {:?}", round.id(), msg.bytes());
}
}
Ok(())
}
round.messages() yields only payloads that decoded cleanly —
falsification-tag verification and collision filtering happen inside
the SDK. You see the application bytes the publisher actually sealed,
not the raw slot bytes.
round.id() is monotonically increasing. Consecutive items from the
stream have strictly increasing ids under normal operation.
Wait for a specific round
use futures::StreamExt;
use zipnet::{Zipnet, RoundId};
async fn wait_for_round(
zipnet: &Zipnet,
target: RoundId,
) -> anyhow::Result<zipnet::Round> {
let mut rounds = zipnet.subscribe().await?;
while let Some(round) = rounds.next().await {
if round.id() == target {
return Ok(round);
}
if round.id() > target {
anyhow::bail!("round {target} is already in the past");
}
}
anyhow::bail!("stream closed before round {target}")
}
If you subscribe after the round you care about has already
finalized, the stream will skip past it — it only yields rounds
that finalize after subscribe returns. Keep your subscription
open if you care about a specific future round.
Gap detection and catch-up
A fresh subscription begins from whatever the committee finalizes next. Earlier rounds are not replayed. If you need the full history, open the subscription before you publish anything and buffer yourself.
If the subscriber falls behind — usually because your round handler
is slower than the round cadence — the SDK’s internal broadcast
channel lags. You see this as a round id gap: one call to
rounds.next().await returns round N, the next returns round
N + k for some k > 1. The lost rounds are gone; the SDK does
not backfill them. The fix is to make the handler non-blocking —
offload heavy work to a separate task:
use futures::StreamExt;
use tokio::sync::mpsc;
let mut rounds = zipnet.subscribe().await?;
let (tx, mut rx) = mpsc::channel(1024);
// Producer: drain the SDK stream as fast as it delivers.
tokio::spawn(async move {
while let Some(round) = rounds.next().await {
if tx.send(round).await.is_err() {
break;
}
}
});
// Consumer: heavy per-round work that can tolerate small bursts.
while let Some(round) = rx.recv().await {
handle(round).await;
}
With this shape, the SDK’s internal buffer drains continuously; the bounded channel between tasks is the one that can fill up, and you control its size.
Raw access
Round::messages() hides everything zipnet-specific — which
client occupied which slot, how the broadcast vector was laid out,
server roster for the round. When you need the underlying
BroadcastRecord, reach for raw():
use zipnet::BroadcastRecord;
while let Some(round) = rounds.next().await {
let rec: &BroadcastRecord = round.raw();
tracing::debug!(
round = %rec.round,
n_participants = rec.participants.len(),
n_servers = rec.servers.len(),
broadcast_bytes = rec.broadcast.len(),
);
}
BroadcastRecord is a public type from zipnet-proto. Most
applications never need it — the hidden-behind-messages() decode
pipeline is what you want.
Multiple subscribers
One Zipnet handle can produce many subscribers:
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let mut rounds_a = zipnet.subscribe().await?;
let mut rounds_b = zipnet.subscribe().await?;
Both receive the same rounds in the same order. Independent lag: slowing down subscriber A does not affect subscriber B.
Shutdown
Dropping the stream is enough. The SDK’s driver keeps running as long
as the Zipnet handle lives; the next subscribe call gives you a
fresh stream from the then-current point in the log.
Connecting to the universe
audience: users
The nuts and bolts of building the Arc<Network> that Zipnet::bind
attaches to. The zipnet SDK never constructs the network for you —
this is intentional. One network can host zipnet alongside other
mosaik services on the shared universe, and you own its lifetime.
The minimum
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let _ = zipnet.publish(b"hello").await?;
Ok(())
}
Network::new(UNIVERSE) produces a network with default mosaik
settings — random SecretKey, mDNS off, no bootstrap peers, no
prometheus endpoint. Enough for local integration tests; rarely
enough for a real deployment.
Bring your own builder
For anything beyond a local experiment, use Network::builder:
use std::{net::SocketAddr, sync::Arc};
use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let network = Arc::new(
Network::builder(UNIVERSE)
.with_mdns_discovery(true)
.with_discovery(
discovery::Config::builder()
.with_bootstrap(universe_bootstrap_peers()),
)
.with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
.build()
.await?,
);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let _ = zipnet.publish(b"hi").await?;
Ok(())
}
fn universe_bootstrap_peers() -> Vec<mosaik::PeerId> { vec![] }
Every argument above is a mosaik concern, not a zipnet one. The full builder reference lives in the mosaik book. The rest of this page covers the fields that matter most for a zipnet user.
UNIVERSE
zipnet::UNIVERSE is the shared NetworkId every zipnet deployment
lives on. Today it is mosaik::unique_id!("mosaik.universe"). When
mosaik ships its own canonical universe constant, this value will be
re-exported verbatim.
If your Network is on a different NetworkId, Zipnet::bind
rejects it with Error::WrongUniverse { expected, actual } before
any I/O happens. There is no way to tunnel zipnet over a
non-universe network; the SDK hard-checks this.
Bootstrap peers
Universe-level, not zipnet-specific. Any reachable peer on the shared universe works as a bootstrap — a mosaik registry node, a friendly operator’s aggregator, your own persistent relay. The operator does not typically hand out zipnet-instance-specific bootstrap peers; they publish one set of universe bootstraps that their zipnet instance (and any other services they host) joins through.
Once your network is bonded to the universe, Zipnet::bind finds
the specific instance’s committee through the shared peer catalog —
you do not need to know anything zipnet-specific at network-builder
time.
use mosaik::discovery;
use zipnet::UNIVERSE;
let network = mosaik::Network::builder(UNIVERSE)
.with_discovery(
discovery::Config::builder()
.with_bootstrap(vec![
// universe-level bootstrap peer IDs, operator-supplied
]),
)
.build()
.await?;
On first connect with no bootstrap peers you fall back to the DHT. That works, but it is slow (tens of seconds on a cold start). At least one bootstrap peer is a practical requirement for anything beyond local tests.
mDNS
.with_mdns_discovery(true) collapses discovery latency from minutes
to seconds on a shared LAN and is harmless elsewhere. Turn it off
only if your security posture forbids advertising peers over mDNS.
Secret key
Omit .with_secret_key(...) for a fresh iroh identity per run. Set a
stable SecretKey if you want a predictable PeerId across
restarts. See Client identity for when each is
appropriate.
Reaching the universe from behind NAT
iroh handles NAT traversal through its relay infrastructure. Most residential and office setups need no extra configuration. Things that help when they don’t:
- Outbound UDP must be allowed. Iroh uses QUIC on UDP.
- Full-cone NAT or better is easy. Symmetric NAT falls back to relay — still works, with extra latency.
- UDP-terminating proxies break iroh. Run the agent from a host with raw outbound UDP.
At startup the network logs its relay choice:
relay-actor: home is now relay https://euc1-1.relay.n0.iroh-canary.iroh.link./
Repeated “Failed to connect to relay server” warnings mean your outbound path is broken; discovery mostly still works via DHT, just slow.
Observability for your own agent
use std::{net::SocketAddr, sync::Arc};
use mosaik::Network;
use zipnet::UNIVERSE;
let network = Arc::new(
Network::builder(UNIVERSE)
.with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
.build()
.await?,
);
Then scrape http://127.0.0.1:9100/metrics — you’ll get mosaik’s
metrics plus whatever you emit with the metrics crate. The zipnet
SDK does not expose its own top-level metrics endpoint; observability
is the network’s job.
One network, many services and instances
Because Zipnet::bind only borrows &Arc<Network>, you pay for one
mosaik endpoint across every service and instance you bind:
use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};
let network = Arc::new(Network::new(UNIVERSE).await?);
let prod = Zipnet::bind(&network, "acme.mainnet").await?;
let testnet = Zipnet::bind(&network, "preview.alpha").await?;
// let multisig = Multisig::bind(&network, "treasury").await?; // hypothetical
// let storage = Storage::bind(&network, "archive").await?; // hypothetical
Each binding derives its own IDs from its own salt, so they coexist on the shared peer catalog without collision. One UDP socket, one DHT record, one gossip loop.
Graceful shutdown
drop(network);
drop cancels everything — open streams, collection readers, bonds.
Mosaik emits a gossip departure so the operator’s logs show you
leaving cleanly. If you want to flush pending zipnet publishes first,
call zipnet.shutdown().await? on each binding before dropping the
network. See Publishing — Shutdown.
Cold-start checklist
If your agent starts but Zipnet::bind returns ConnectTimeout:
- The
Arc<Network>is onUNIVERSE. If you seeWrongUniverseinstead, the network was built against a differentNetworkId. Switch back toUNIVERSE. - The instance name matches the operator’s exactly. Typos
surface as
ConnectTimeout, notInstanceNotFound. Consider pinning viazipnet::instance_id!("name")so the name is checked at build time. - Bootstrap
PeerIds are reachable.nc -zv <their_host>or whatever the operator tells you to test. - Outbound UDP is allowed.
iperfover UDP to a public host. - Your mosaik version matches (
=0.3.17). Any minor-version drift changes wire formats.
If none of these resolves it, see Troubleshooting.
TEE-gated deployments
audience: users
Some zipnet deployments require every participant — committee members and publishing clients — to run inside a TDX enclave whose measurement matches the operator’s expected MR_TD. This chapter covers the user side of that setup.
Is the deployment TEE-gated?
Ask the operator. Specifically:
- Does the committee stack a
Tdxvalidator on its admission tickets? - If so, what MR_TD must your client image report?
If the answer to the first question is no, skip this chapter — the rest of the user guide applies unchanged.
How the SDK decides whether to attest
TDX is a Cargo feature on the zipnet crate, not a function of the
instance name:
tee-tdxdisabled (default). The SDK runs a mocked attestation path. YourPeerEntrydoes not carry a TDX quote. A TDX-gated operator’s committee rejects you at bond time — you seeError::ConnectTimeout(the rejection is silent at the discovery layer) orError::Attestationif the operator has enabled a stricter surfacing mode.tee-tdxenabled.Zipnet::binduses mosaik’s real TDX path to generate a quote bound to your currentPeerIdand attach it to your discovery entry. The committee validates the quote before admitting you.
# Cargo.toml for a user-side agent that must attest.
[dependencies]
zipnet = { version = "0.1", features = ["tee-tdx"] }
With the feature on, your binary only runs correctly inside a real
TDX guest. The TDX hardware refuses to quote from a non-TDX machine,
so bind surfaces that as Error::Attestation("…").
Build-time: produce a TDX image
Add mosaik’s TDX builder to your crate:
[build-dependencies]
mosaik = { version = "=0.3.17", features = ["tdx-builder-alpine"] }
# or: features = ["tdx-builder-ubuntu"]
build.rs:
fn main() {
mosaik::tee::tdx::build::alpine().build();
}
This produces a bootable TDX guest image at
target/<profile>/tdx-artifacts/<crate>/alpine/ plus a precomputed
<crate>-mrtd.hex. The operator either uses your MR_TD as their
expected value, or — if they pin a specific image — hands you theirs
and you rebuild to match.
The mosaik TDX reference covers Alpine vs Ubuntu trade-offs, SSH and kernel customization, and environment-variable overrides.
The operator → user handshake for TDX
A TDX-gated deployment adds one item to the three-item handshake in What you need from the operator:
| Item | What it is |
|---|---|
| Committee MR_TD | The 48-byte hex measurement the operator’s committee images use. |
The operator hands this out via their release notes, not via the
wire. The zipnet SDK does not bake per-instance MR_TD mappings in —
there is no table of “acme.mainnet requires MR_TD abc…” inside
the crate. Keeping that mapping client-side is the operator’s
responsibility, published out of band.
When the operator rotates the image, your old quote stops validating; the fix is to rebuild with the new MR_TD and redeploy. There is no auto-discovery of acceptable measurements on the wire.
Multi-variant deployments
During a rollout, an operator may accept multiple client MR_TDs
simultaneously — usually the old and the new during a staged migration.
You only need to match one of them. The precomputed hex files in
target/<profile>/tdx-artifacts/<crate>/.../ tell you what your image
reports; compare against the list the operator publishes.
Sealing secrets inside the enclave
Zipnet’s current SDK does not expose a sealed-storage helper — each
Zipnet::bind generates a fresh per-binding DH identity in process
memory. That is fine for the default anonymous-use-case model, where
identity is meant to rotate.
If you need stable identity across enclave reboots for a reputation use case, you will need to persist state to TDX sealed storage yourself today. That is out of scope for the SDK and likely to land as a mosaik primitive rather than a zipnet feature; watch the mosaik release notes.
Falling back to non-TDX for development
If you’re writing integration tests and don’t want a TDX VM in the
loop, build without the tee-tdx feature and use a deployment whose
operator has disabled TDX gating. Typical arrangement:
- Production and staging:
tee-tdxon both sides. - Local dev / CI:
tee-tdxoff on both sides.
The operator runs the dev instance without the Tdx validator on
committee admissions; you build your client without the tee-tdx
feature. Both sides’ mocks line up.
Failure modes
The error the SDK surfaces when TDX is involved is
Error::Attestation(String).
Common causes:
- You built with
tee-tdxbut aren’t running inside a TDX guest (hardware refuses to quote). - Your MR_TD differs from the operator’s. Rebuild with their image.
- The operator rotated MR_TD and you haven’t. Rebuild.
ConnectTimeout can also stem from TDX mismatches on deployments
that surface attestation failures silently at the bond layer; see
Troubleshooting.
Troubleshooting from the user side
audience: users
Failure modes you can observe from your own agent, mapped to the SDK’s error enum and the fastest check for each.
The error enum
pub enum Error {
WrongUniverse { expected: mosaik::NetworkId, actual: mosaik::NetworkId },
ConnectTimeout,
Attestation(String),
Shutdown,
Protocol(String),
}
Five variants. The two you will hit most in development are
ConnectTimeout and WrongUniverse. Everything else is either a
real runtime condition or lower-level plumbing surfaced through
Protocol.
Symptom: bind returns ConnectTimeout
This is the single most common dev-time error. It means the SDK could not bond to a peer serving your instance within the connect deadline. In descending order of likelihood:
1. Typo in the instance name
Your code derives UniqueIds from "zipnet." + instance_name via
blake3. A one-character change produces a completely different id,
and nobody is serving it.
Fix: double-check the name against the operator’s handoff. Prefer pinning it as a compile-time constant so typos become build errors:
use zipnet::{Zipnet, UniqueId, UNIVERSE};
const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");
let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;
2. Operator’s committee isn’t up
The name is right, but nobody is currently serving it. The SDK
cannot distinguish “nobody serves this” from “operator isn’t up yet”
without an on-network registry — both surface as ConnectTimeout.
Fix: ask the operator whether the deployment is live.
3. Bootstrap peers unreachable
Even if the instance name is right and the committee is up, your network never bonded to the universe — so it never found the committee. Usually shows up alongside no peer-catalog growth.
Fix: check the bootstrap peer list. See Connecting — Cold-start checklist.
4. TDX posture mismatch
Silent rejection at the bond layer from a TDX-gated deployment
often looks like ConnectTimeout rather than a clear Attestation
error. Common when your client is built without the tee-tdx
feature against a TDX-gated operator.
Fix: see TEE-gated deployments.
Symptom: bind returns WrongUniverse
Your Arc<Network> was built against a different NetworkId than
zipnet::UNIVERSE. The error payload tells you both values:
match zipnet::Zipnet::bind(&network, "acme.mainnet").await {
Err(zipnet::Error::WrongUniverse { expected, actual }) => {
tracing::error!(%expected, %actual, "network on wrong universe");
}
…
}
Fix: build the network with Network::new(UNIVERSE) or
Network::builder(UNIVERSE). There is no way to tunnel zipnet over
a non-universe network.
Symptom: bind returns Attestation
TDX attestation failed. The string payload names the specific failure from the mosaik TDX stack.
Common causes:
- You built with
tee-tdxbut aren’t running inside a TDX guest. - Your MR_TD differs from the operator’s expected value (fresh image you haven’t rebuilt, or operator rotated).
- Your quote has expired.
Symptom: publish returns Outcome::Collided
Another client hashed to the same slot this round. Both payloads get XOR-corrupted; no observable message lands for either of you.
Fix: retry on the next round. See Publishing — Retry policy.
Persistent collisions are a signal that the deployment is oversubscribed
for its num_slots — an operator-side tuning problem, not a user one.
Collision probability per pair per round is 1 / num_slots; for N
clients the expected number of collisions per round is
C(N, 2) / num_slots.
Symptom: publish returns Outcome::Dropped
The aggregator never forwarded your envelope into a committed aggregate. Usually transient:
- Aggregator was offline that round.
- Your registration hadn’t propagated yet (first few seconds after
bind).
Fix: retry. Repeated Dropped across many rounds means the
aggregator is unreachable from you — check the peer catalog and
bootstrap peers, then contact the operator.
Symptom: subscription sees no new rounds for a long time
Two possibilities:
1. The committee is stuck
The cluster is not finalizing rounds. Contact the operator.
2. Your binding hasn’t caught up yet
bind waits for the first live round roster before returning, so
once you have a Zipnet handle, round delivery should start at the
next round boundary. If it does not, you are not reaching the
broadcast collection’s group — same checks as for ConnectTimeout
(bootstrap, UDP egress, TDX).
Symptom: publish or subscribe returns Shutdown
The binding is closing. Either you called shutdown(), dropped every
clone of the handle, or the underlying Network went down.
Fix: shutdown is idempotent-ish — further calls keep returning
Shutdown. If this is unexpected, check that the Arc<Network> is
still alive and that no other part of your code called shutdown on
the handle.
Symptom: Error::Protocol(…) with an opaque string
The SDK bubbled up a lower-level mosaik or zipnet-protocol failure. The string content is for humans — do not pattern-match on it.
Fix: enable verbose logging and inspect the mosaik-layer event stream:
RUST_LOG=info,zipnet=debug,mosaik=info cargo run
If the root cause is in mosaik, the mosaik book has better diagnostics than this page can. Open a zipnet issue with the log excerpt if the failure looks zipnet-specific.
Symptom: subscriber lags and misses rounds
Your round handler is slower than the deployment’s round cadence.
Internal broadcast channels drop rounds rather than stall the SDK,
so you see gaps in round.id().
Fix: offload heavy per-round work to a separate task. See Reading — Gap detection and catch-up.
Symptom: my client compiled against one version, the operator upgraded
Mosaik pinned to =0.3.17 on both sides; zipnet and zipnet-proto
baselines must also match the deployment. If WIRE_VERSION or
round-parameter defaults change, your client derives different
internal IDs and bind returns ConnectTimeout.
Fix: keep your zipnet dep version aligned with the operator’s release notes. Mosaik stays pinned.
When to escalate to the operator
bindconsistently fails withConnectTimeoutafter the name, bootstrap, and universe have all been verified.publishkeeps returningOutcome::Droppedacross many rounds.- Your subscription opens but sees no rounds finalize over several round periods.
When you escalate, include:
- Your mosaik version (
=0.3.17) and zipnet SDK version. - The instance name you are binding to.
- Whether you built with
tee-tdxand, if so, your client’s MR_TD. - A 60-second log excerpt at
RUST_LOG=info,zipnet=debug,mosaik=info.
API reference
audience: users
A compact reference of the surface the zipnet facade crate exposes.
Link-in-book pages cover the “how”; this page is the “what”.
The whole import story
Almost every user-side agent pulls from exactly one module:
use zipnet::{
// The universe constant.
UNIVERSE,
// The handle and its stream type.
Zipnet, BroadcastStream,
// Identifiers and macros.
UniqueId, NetworkId, Tag, unique_id, instance_id, instance_id as _,
// Value types returned by publish / subscribe.
Receipt, Outcome, Round, Message,
// Protocol types re-exported from zipnet-proto.
BroadcastRecord, RoundId,
// Error model.
Error, Result,
};
The instance_id! macro is re-exported at the crate root via
#[macro_export], so zipnet::instance_id!("name") works alongside
the runtime zipnet::instance_id(name) function.
Constants
| Item | Type | Role |
|---|---|---|
zipnet::UNIVERSE | NetworkId | The shared mosaik universe every zipnet deployment lives on. Build your Network against it. |
Handle
#[derive(Clone)]
pub struct Zipnet { /* opaque */ }
Cloneable; all clones share one driver task. Drop every clone or call
shutdown to tear down a binding.
| Method | Returns | Purpose |
|---|---|---|
Zipnet::bind(&Arc<Network>, &str) | Result<Self> | Bind by instance name. |
Zipnet::bind_by_id(&Arc<Network>, UniqueId) | Result<Self> | Bind by pre-derived id (use with instance_id!). |
.publish(impl Into<Vec<u8>>) | Result<Receipt> | Publish a payload; resolves after the carrying round finalizes. |
.subscribe() | Result<BroadcastStream> | Stream of finalized rounds. |
.shutdown() | Result<()> | Flush in-flight publishes and tear down. |
See Publishing and Reading for usage patterns.
Identifier helpers
pub fn zipnet::instance_id(name: &str) -> UniqueId;
// macro:
pub macro zipnet::instance_id($name:literal) { /* compile-time */ }
Both produce identical bytes — blake3("zipnet." + name). Prefer the
macro when the name is a literal so typos fail at build time.
Value types
pub struct Receipt {
pub round: RoundId,
pub slot: usize,
pub outcome: Outcome,
}
pub enum Outcome {
Landed, // happy path
Collided, // slot collision; retry next round
Dropped, // aggregator never forwarded; retry
}
pub struct Round { /* opaque */ }
impl Round {
pub fn id(&self) -> RoundId;
pub fn messages(&self) -> impl Iterator<Item = Message> + '_;
pub fn raw(&self) -> &BroadcastRecord;
}
pub struct Message { /* opaque */ }
impl Message {
pub fn bytes(&self) -> &[u8];
pub fn slot(&self) -> usize;
}
pub struct BroadcastStream;
impl futures::Stream for BroadcastStream {
type Item = Round;
}
Round::messages() yields only slots that decoded cleanly — malformed
or colliding slots are filtered out inside the SDK. Round::raw()
escapes to the underlying BroadcastRecord for the rare case you need
it.
Errors
pub type Result<T, E = Error> = core::result::Result<T, E>;
#[derive(Debug, thiserror::Error)]
pub enum Error {
WrongUniverse { expected: NetworkId, actual: NetworkId },
ConnectTimeout,
Attestation(String),
Shutdown,
Protocol(String),
}
See Troubleshooting for a per-variant diagnostic checklist.
Re-exports from mosaik
| Item | From | Use |
|---|---|---|
UniqueId | mosaik::UniqueId | Alias for 32-byte intent-addressed identifiers. |
NetworkId | mosaik::NetworkId | Type of UNIVERSE and WrongUniverse fields. |
Tag | mosaik::Tag | Peer-catalog tag type. Rarely needed directly. |
unique_id! | mosaik::unique_id! | Compile-time UniqueId construction. |
Re-exports from zipnet-proto
| Item | Role |
|---|---|
BroadcastRecord | The finalized round record inside a Round. |
RoundId | Monotonic round counter; RoundId::next() to advance. |
What you do NOT import
zipnet_node::*— committee and role internals. Users do not constructCommitteeMachines or run committee Raft groups.mosaik::groups::GroupKey— you do not have committee secrets.- Any raw
StreamId/StoreId/GroupId— the SDK derives them from the instance name. Do not try to pin them yourself.
If you find yourself reaching for these, you are probably writing an operator or contributor concern. Revisit What you need from the operator.
Version compatibility
| Dependency | Version | Note |
|---|---|---|
mosaik | =0.3.17 | Pin exactly; minor versions change wire formats. |
zipnet | follow the deployment’s release notes | Keep in lockstep with the operator’s version. |
tokio | 1.x | Any compatible minor. |
futures | 0.3 | For StreamExt::next on BroadcastStream. |
When the operator announces a deployment upgrade, they should publish the zipnet version to use. Users rebuild and redeploy in lockstep.
Deployment overview
audience: operators
A zipnet deployment runs as one service among many on a shared
mosaik universe — a single NetworkId that hosts zipnet alongside
other mosaik services. What you stand up is an instance of zipnet
under a short, namespaced name you pick (e.g. acme.mainnet).
Multiple instances coexist on the same universe concurrently, each
with its own committee, ACL, round parameters, and committee MR_TD.
If you haven’t yet, read the Quickstart — it walks you end-to-end from a fresh checkout to a live instance. This page gives the architectural background the runbooks later in this section refer back to.
The shared universe model
- The universe constant is
zipnet::UNIVERSE = unique_id!("mosaik.universe"). Override only for an isolated federation viaZIPNET_UNIVERSE; in the common case, leave it alone. - All your nodes — committee servers, aggregator, clients — join that
same universe. Mosaik’s standard peer discovery (
/mosaik/announcegossip plus the Mainline DHT bootstrap) handles reachability. You don’t configure streams, groups, or IDs by hand. - The instance is identified by
ZIPNET_INSTANCE(e.g.acme.mainnet). Every sub-ID — committeeGroupId, submitStreamId, broadcastsStoreId— is derived from that name, so typos surface asConnectTimeoutrather than a config error.
Publishers bond to your instance knowing only three things: the
universe NetworkId, the instance name, and (for TDX-gated
deployments) your committee MR_TD. You hand those out in release
notes or docs; there is no on-network registry to publish to and
nothing to advertise.
Three node roles
A zipnet deployment has three kinds of nodes. You — the operator — will run at least the first two. The third is optional (most publishers are external users running their own clients).
| Role | Count | Trust status | Resource profile |
|---|---|---|---|
| Committee server | 3 or more (odd) | any-trust: at least one must be honest for anonymity; all must be up for liveness in v1 | low CPU, modest RAM, stable identity, low churn |
| Aggregator | 1 (v1) | untrusted for anonymity, trusted for liveness | higher CPU + bandwidth, can churn |
| Publishing client | many | TDX-attested in production; untrusted for liveness | ephemeral; any churn is tolerated |
What every node needs
- Outbound UDP to the internet (iroh / QUIC transport) and to mosaik relays.
- A few MB of RAM; committee servers need more during large-round replay.
- A clock within a few seconds of the rest of the universe (Raft tolerates skew but not arbitrary drift).
ZIPNET_INSTANCE=<name>set to the same instance name on every node in that deployment.
What only committee servers need
- A stable
PeerIdacross restarts. SetZIPNET_SECRETto any string — it is hashed with blake3 to derive the node’s long-term iroh identity. Rotating it invalidates every bond. - Access to the shared committee secret, passed as
ZIPNET_COMMITTEE_SECRET. This gates admission to the Raft group. Distribute it out of band (vault, secrets manager, k8s secret). Anyone holding it can join the committee — treat it like a root credential. - In production, a TDX host. Mosaik ships the TDX image builder;
you call
mosaik::tee::tdx::build::ubuntu()from yourbuild.rsand get a launch script, initramfs, OVMF, and a precomputed MR_TD at build time. See the Quickstart’s TDX section. - Durable storage is not required in v1 (state is in memory). A restarted server rejoins and catches up by snapshot.
What only aggregators need
- More network bandwidth than committee servers. The aggregator receives every client envelope and emits a single aggregate per round.
- A stable
PeerIdis strongly recommended — clients often use the aggregator as a discovery bootstrap. - The aggregator does not need the committee secret. It is untrusted for anonymity.
What only clients need
- The universe
NetworkId, instance name, and (for TDX-gated instances) your committee MR_TD. That is the whole handshake. - A TDX host if the instance is TDX-gated. See Security posture checklist.
How the three talk
clients ── ClientEnvelope stream ─────► aggregator
│
AggregateEnvelope stream
│
▼
committee servers
│
Raft-replicated apply
│
▼
Broadcasts collection (readable by anyone)
Clients and the aggregator are not members of the committee’s Raft group; they observe the final broadcasts through a replicated collection.
Minimum viable deployment
Three committee servers + one aggregator + a handful of clients is the smallest deployment where anonymity holds meaningfully. Two committee servers will technically run but any one of them can deanonymize the set — stick to three or more.
TDX host A TDX host B TDX host C
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ zipnet- │ │ zipnet- │ │ zipnet- │
│ server #1 │ │ server #2 │ │ server #3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
└────────────────────┼────────────────────┘
│ Raft / mosaik group
▼
┌───────────────────┐
│ zipnet-aggregator │ (non-TDX host, well-connected)
└─────────┬─────────┘
│
▼
external publishers
(TDX where gated, else
operator-trusted hosts)
Each box runs ZIPNET_INSTANCE=acme.mainnet and joins
zipnet::UNIVERSE over iroh; mosaik discovery wires the rest.
Running many instances side by side
Operators routinely run several instances — production, a public testnet, internal dev — on the same universe. Each has its own instance name, its own committee, its own MR_TD pin, its own ACL. Hosts can host one or many; run a separate unit per instance:
systemctl start zipnet-server@acme-mainnet
systemctl start zipnet-server@preview.alpha
systemctl start zipnet-server@dev.ops
Each unit sets a different ZIPNET_INSTANCE; they share the universe
and the discovery layer, and appear to publishers as three distinct
Zipnet::bind targets.
See also
- Running a committee server
- Running the aggregator
- Running a client
- Rotations and upgrades
- Designing coexisting systems on mosaik — full rationale for the shared-universe model, for operators who want to understand why the instance is the unit of identity.
Quickstart — stand up an instance
audience: operators
This page walks you from a fresh checkout to a live zipnet instance that external publishers can reach with one line of code. Read Deployment overview first for the architectural background; this page assumes it.
Who runs a zipnet instance
Typical deployments:
- A rollup or app offering an encrypted mempool. The team runs the committee; user wallets publish sealed transactions; the sequencer or builder reads them ordered and opaque-to-sender, and decrypts at block-build time via whatever mechanism they prefer (threshold decryption, TEE unsealing).
- An MEV auction team hosting a permissioned order-flow channel. The team runs the committee; whitelisted searchers publish intents; every connected builder reads the same ordered log.
- A governance coalition running anonymous signalling. The coalition runs the committee; delegated wallets signal anonymously; anyone can tally.
What’s common: you want a bounded participant set — which you authenticate via TEE attestation and a ticket class — to publish messages without any single party (yourself included) being able to link message to sender. You run the committee and the aggregator. Participants bring their own TEE-attested client software, typically from a TDX image you also publish.
One-paragraph mental model
Zipnet runs as one service among many on a shared mosaik universe
— a single NetworkId that hosts zipnet alongside other mosaik
services (signers, storage, oracles). Your job as an operator is to
stand up an instance of zipnet under a name you pick (e.g.
acme.mainnet) and keep it running. External agents bind to your
instance with Zipnet::bind(&network, "acme.mainnet") — they compile
the name in from their side, so there is no registry to publish to
and nothing to advertise. Your servers simply need to be reachable.
What you’re running
A minimum instance is:
| Role | Count | Hosted where |
|---|---|---|
| Committee server | 3 or more (odd) | TDX-enabled hosts you operate |
| Aggregator | 1 (v1) | Any host with outbound UDP |
| (optional) Your own publishing clients | any | TDX-enabled if the instance is gated |
All of these join the same shared mosaik universe. The committee and aggregator advertise on the shared peer catalog; external publishers reach them through mosaik’s discovery without any further config from you.
What defines your instance
Your instance is fully identified by three pieces of configuration:
| # | Field | Notes |
|---|---|---|
| 1 | instance name | Short, stable, namespaced string (e.g. acme.mainnet). Folds into the committee GroupId, submit StreamId, and broadcasts StoreId. |
| 2 | universe NetworkId | Almost always zipnet::UNIVERSE. Override only if you run an isolated federation. |
| 3 | ticket class | What publishers must present: TDX MR_TD, JWT issuer, or both. Also folds into GroupId. |
Round parameters (num_slots, slot_bytes, round_period,
round_deadline) are configured per-instance via env vars and
published at runtime in the LiveRoundCell collection that
publishers read. They are immutable for the instance’s lifetime —
bumping any of them requires a new instance name.
Items 1 and 3 fold into the instance’s derived IDs. Change either and the instance’s identity changes, meaning publishers compiled against the old values can no longer bond. See Designing coexisting systems on mosaik for the derivation.
Minimal smoke test
Before you touch hardware, confirm the pipeline works end-to-end on your laptop. The deterministic check is the integration test that exercises three committee servers + one aggregator + two clients over real mosaik transports in one tokio runtime:
cargo test -p zipnet-node --test e2e one_round_end_to_end
A green run in roughly 10 seconds tells you the crypto, consensus, round lifecycle, and mosaik transport are all healthy in your checkout. If it fails, nothing else on this page is going to work — investigate before touching hardware.
Exercising the binaries directly (optional)
If you want to watch the three role binaries run as separate processes — useful for shaking out systemd units, env vars, or firewall rules — bootstrap them by hand on one host. Localhost discovery over fresh iroh relays is slow, so give the first round up to a minute to land.
# terminal 1 — seed committee server; grab its peer= line from stdout
ZIPNET_INSTANCE="dev.local" \
ZIPNET_COMMITTEE_SECRET="dev-committee-secret" \
ZIPNET_SECRET="seed-1" \
./target/debug/zipnet-server
# terminals 2+3 — remaining committee servers, bootstrapped off #1
ZIPNET_INSTANCE="dev.local" \
ZIPNET_COMMITTEE_SECRET="dev-committee-secret" \
ZIPNET_SECRET="seed-2" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
./target/debug/zipnet-server
# terminal 4 — aggregator
ZIPNET_INSTANCE="dev.local" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
./target/debug/zipnet-aggregator
# terminal 5 — reference publisher
ZIPNET_INSTANCE="dev.local" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
ZIPNET_MESSAGE="hello from the smoke test" \
./target/debug/zipnet-client
A healthy run prints round finalized on the committee servers
within a minute and the client’s payload echoes back on the
subscriber side. TDX is off in this mode — production instances
re-enable it (see below).
What every server process does for you
When zipnet-server starts it:
- Joins the shared universe network (
zipnet::UNIVERSE, or whatever you setZIPNET_UNIVERSEto). - Derives every instance-local id from
ZIPNET_INSTANCE— committeeGroupId, the submit stream, the broadcasts collection, the registries. - Bonds with its peers using the committee secret and TDX measurement.
- Advertises itself on the shared peer catalog via mosaik’s standard
/mosaik/announcegossip. Publishers that compile in the same instance name reach the sameGroupIdand bond automatically. - Accepts rounds from the aggregator and replicates broadcasts through the committee Raft group.
You do not configure streams, collections, or group ids by hand, and you do not publish an announcement anywhere. The instance name is the only piece of identity you manage; everything else is either derived or taken care of by mosaik.
Building a TDX image (production path)
For production, every committee server and every publishing client runs inside a TDX guest. Mosaik ships the image builder — you do not compose QEMU, OVMF, kernels, and initramfs yourself, and you do not compute MR_TD by hand.
In the committee server crate’s build.rs:
// crates/zipnet-server/build.rs
fn main() {
mosaik::tee::tdx::build::ubuntu()
.with_default_memory_size("4G")
.build();
}
Add to Cargo.toml:
[dependencies]
mosaik = { version = "0.3", features = ["tdx"] }
[build-dependencies]
mosaik = { version = "0.3", features = ["tdx-builder-ubuntu"] }
After cargo build --release you get, in
target/release/tdx-artifacts/zipnet-server/ubuntu/:
| Artifact | What it’s for |
|---|---|
zipnet-server-run-qemu.sh | Self-extracting launcher. This is what you invoke on a TDX host. |
zipnet-server-mrtd.hex | The 48-byte measurement. Publishers pin against this. |
zipnet-server-vmlinuz | Raw kernel, in case you repackage. |
zipnet-server-initramfs.cpio.gz | Raw initramfs. |
zipnet-server-ovmf.fd | Raw OVMF firmware. |
Mosaik computes MR_TD at build time by parsing the OVMF, the kernel and the initramfs according to the TDX spec — the same value the TDX hardware will report at runtime. You ship this hex string alongside your announcement; a client whose own image does not measure to the same MR_TD cannot join the instance. See users/handshake-with-operator for the matching client-side flow.
Alpine variant (mosaik::tee::tdx::build::alpine(), feature
tdx-builder-alpine) produces a ~5 MB image versus Ubuntu’s ~25 MB,
at the cost of musl. Use Alpine for publishers where image size
matters; keep Ubuntu for committee servers unless you have a specific
reason otherwise.
Instance naming and your users’ handshake
Publishers bond to your instance by knowing three things: the
universe NetworkId, the instance name, and (if TDX-gated) the
MR_TD of your committee image. That is the complete handoff — no
registry, no dynamic lookup, no on-network advertisement.
Publish these via whatever channel suits your users: release notes,
a docs page, direct handoff in a setup email. Users bake the
instance name (or its derived UniqueId) into their code at compile
time.
Instance names share a flat namespace per universe. Two operators
picking the same name collide in the committee group and neither
works correctly — mosaik has no mechanism to prevent this and no way
to tell you it happened. Namespace aggressively: <org>.<purpose>.<env>,
for example acme.mixer.mainnet. If in doubt, include an
irrevocable random suffix once and forget about it
(acme.mixer.mainnet.8f3c1a).
Retiring an instance is just stopping every server under that name.
Publishers still trying to bond will see ConnectTimeout; they
update their code to the new name and rebuild.
Going live
Once the smoke test passes on staging hardware:
- Build your production TDX images (committee + client). Publish
the two
mrtd.hexvalues to whatever channel your users consume (docs site, release notes, signed announcement). - Stand up three TDX committee servers on geographically separate
hosts, with the production
ZIPNET_INSTANCEandZIPNET_COMMITTEE_SECRET. - Stand up the aggregator on a non-TDX but well-connected host.
- Verify the committee has elected a leader and the aggregator is
bonded to the submit stream. Your own aggregator metrics are the
easiest check; on the committee side, exactly one server should
report
mosaik_groups_leader_is_local = 1. - Hand publishers your instance name, one universe bootstrap
PeerId, and (if TDX-gated) your committee MR_TD. That is the entirety of their onboarding.
Running many instances side by side
Operators routinely run several instances — production, a public testnet, internal dev — on the same universe. Each has its own instance name, its own committee, its own MR_TD pin, its own ACL. Hosts can host one or many; the binary multiplexes them:
systemctl start zipnet-server@acme-mainnet
systemctl start zipnet-server@preview.alpha
systemctl start zipnet-server@dev.ops
Each unit sets a different ZIPNET_INSTANCE; they share the universe
and the discovery layer, and appear to publishers as three distinct
Zipnet::bind targets.
Next reading
- Running a committee server — every environment variable and what it does.
- Running the aggregator — the untrusted-but-load-bearing node.
- Rotations and upgrades — retiring an instance, rebuilding TDX images, rotating committee secrets.
- Monitoring and alerts — the metrics that matter in production.
- Incident response — stuck rounds, split brain, expired MR_TDs.
- Security posture checklist — what committee operators must protect.
- Designing coexisting systems on mosaik — the shared-universe model in full, for operators who want to understand why the instance is the unit of identity.
audience: operators
End-to-end deploy example — one TDX host
A worked, copy-pasteable runbook that stands up a complete zipnet
instance on a single TDX-capable host reachable at
ubuntu@tdx-host. The topology is the minimum viable deployment:
three committee servers, one aggregator, one reference publisher,
all co-located as separate TDX guests (plus one non-TDX process for
the aggregator) on the same physical host.
Use this recipe for staging, integration, or a demo. For production, split the three committee servers onto three independently-operated TDX hosts — the steps per host are identical; only the bootstrap wiring changes.
What you are about to build
ubuntu@tdx-host (one physical TDX server)
┌──────────────────────────────────────────────────────────────┐
│ TDX guest #1 TDX guest #2 TDX guest #3 │
│ zipnet-server-1 zipnet-server-2 zipnet-server-3 │
│ │ │ │ │
│ └────── Raft / mosaik group (committee) ──┘ │
│ │ │
│ ┌─────────────▼──────────────┐ │
│ │ zipnet-aggregator (no TDX) │ │
│ └─────────────┬──────────────┘ │
│ │ │
│ ┌────────────▼────────────┐ │
│ │ TDX guest #4 │ │
│ │ zipnet-client (demo) │ │
│ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
The instance name used throughout is demo.tdx. Swap it for your
own namespaced name before running anything in production
(<org>.<purpose>.<env>; see
Quickstart — naming the instance).
Prerequisites
On your workstation:
- A checkout of this repo.
- Rust 1.93 (
rustup showconfirmsrust-toolchain.toml). - SSH access to the host:
ssh ubuntu@tdx-hostreturns a shell. scpandrsyncavailable locally.
On ubuntu@tdx-host:
- Bare-metal or cloud host with Intel TDX enabled in BIOS and a TDX
kernel installed.
ls /dev/tdx_guestexists on the host and the kernel modulekvm_intelis loaded withtdx=Y. If you are unsure, rundmesg | grep -i tdx. qemu-system-x86_64at a version the mosaik launcher supports (8.2+). The launcher script will tell you if the local QEMU is too old.- A user that can access
/dev/kvmand/dev/tdx_guestwithout root. On Ubuntu, addubuntuto thekvmandtdxgroups. tmux(used below to keep each role’s logs visible). Any process supervisor works — systemd user units,screen,nohup. The commands that follow usetmuxbecause it is the lowest-ceremony option.- Outbound UDP to the internet for iroh / QUIC and mosaik relays. No inbound ports need to be opened — mosaik’s hole-punching layer handles reachability.
Two small decisions fixed for this example:
| Knob | Value used here | Why |
|---|---|---|
ZIPNET_INSTANCE | demo.tdx | Short, obvious, collision-unlikely. Rename freely. |
ZIPNET_COMMITTEE_SECRET | openssl rand -hex 32 once, pasted into the env for all three servers | Shared admission secret for the committee. Clients and the aggregator must not see this value. |
ZIPNET_MIN_PARTICIPANTS | 1 | So the single demo client triggers rounds. Raise to >=2 for real anonymity. |
ZIPNET_ROUND_PERIOD | 3s | Enough headroom on a shared host to see logs land in order. |
Step 1 — Build the TDX artifacts on your workstation
From the repo root, build everything release-mode. The build.rs
scripts in zipnet-server and zipnet-client invoke the mosaik
TDX builder and drop launchable artifacts under
target/release/tdx-artifacts/.
cargo build --release
When this finishes you have:
target/release/
zipnet-aggregator # plain binary; runs on any host
tdx-artifacts/
zipnet-server/ubuntu/
zipnet-server-run-qemu.sh # self-extracting launcher
zipnet-server-mrtd.hex # 48-byte committee measurement
zipnet-server-vmlinuz
zipnet-server-initramfs.cpio.gz
zipnet-server-ovmf.fd
zipnet-client/alpine/
zipnet-client-run-qemu.sh
zipnet-client-mrtd.hex # 48-byte client measurement
zipnet-client-vmlinuz
zipnet-client-initramfs.cpio.gz
zipnet-client-ovmf.fd
Record both mrtd.hex values — these are the MR_TDs you will
publish to readers alongside the instance name.
SERVER_MRTD=$(cat target/release/tdx-artifacts/zipnet-server/ubuntu/zipnet-server-mrtd.hex)
CLIENT_MRTD=$(cat target/release/tdx-artifacts/zipnet-client/alpine/zipnet-client-mrtd.hex)
echo "committee MR_TD: $SERVER_MRTD"
echo "client MR_TD: $CLIENT_MRTD"
Step 2 — Copy artifacts to the host
ssh ubuntu@tdx-host 'mkdir -p ~/zipnet/{server,client,aggregator,logs}'
rsync -avz --delete \
target/release/tdx-artifacts/zipnet-server/ubuntu/ \
ubuntu@tdx-host:~/zipnet/server/
rsync -avz --delete \
target/release/tdx-artifacts/zipnet-client/alpine/ \
ubuntu@tdx-host:~/zipnet/client/
scp target/release/zipnet-aggregator \
ubuntu@tdx-host:~/zipnet/aggregator/
The launcher scripts are self-extracting — they embed kernel,
initramfs, and OVMF. You do not need to copy the raw vmlinuz /
initramfs / ovmf.fd files unless you plan to repackage.
Step 3 — Pick a committee secret
On the TDX host, once, generate the shared committee secret and park it in a file you will source into each server’s environment. Anyone with this value can join the committee, so treat it as a root credential.
ssh ubuntu@tdx-host
# on the host
umask 077
openssl rand -hex 32 > ~/zipnet/committee-secret
chmod 600 ~/zipnet/committee-secret
Step 4 — Start the first committee server and capture its PeerId
The first server has no one to bootstrap against, so it starts
without ZIPNET_BOOTSTRAP. Its startup line prints
peer=<hex>… — capture that and reuse it as the bootstrap hint for
every following process.
Open a tmux session on the host and start server 1:
# on the host
tmux new-session -d -s zipnet-s1 -n server-1
tmux send-keys -t zipnet-s1:server-1 "
ZIPNET_INSTANCE=demo.tdx \
ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
ZIPNET_SECRET=server-1-seed \
ZIPNET_MIN_PARTICIPANTS=1 \
ZIPNET_ROUND_PERIOD=3s \
ZIPNET_ROUND_DEADLINE=15s \
RUST_LOG=info,zipnet_node=info \
~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-1.log
" C-m
Wait five or ten seconds for the TDX guest to come up, then pull the PeerId out of the log:
# on the host
BOOTSTRAP=$(grep -oE 'peer=[0-9a-f]{10,}' ~/zipnet/logs/server-1.log | head -1 | cut -d= -f2)
echo "bootstrap peer: $BOOTSTRAP"
If $BOOTSTRAP is empty, the guest has not finished booting — the
first round of QEMU + TDX can take 30 s on a cold host. Re-run the
grep after a beat.
What if I don’t see the
peer=line? The self-extracting launcher prints its own boot banner first. The zipnet line (zipnet up: network=<universe> instance=demo.tdx peer=...) only appears once the binary inside the guest has announced. If it is still missing after a minute,less ~/zipnet/logs/server-1.logand look for QEMU-level errors — typically TDX not enabled, or/dev/kvmpermissions.
Step 5 — Start the remaining two committee servers
Each server gets a distinct ZIPNET_SECRET (so each derives a
unique PeerId) and bootstraps against server 1.
# on the host — still inside your SSH session
tmux new-session -d -s zipnet-s2 -n server-2
tmux send-keys -t zipnet-s2:server-2 "
ZIPNET_INSTANCE=demo.tdx \
ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
ZIPNET_SECRET=server-2-seed \
ZIPNET_BOOTSTRAP=$BOOTSTRAP \
ZIPNET_MIN_PARTICIPANTS=1 \
ZIPNET_ROUND_PERIOD=3s \
ZIPNET_ROUND_DEADLINE=15s \
RUST_LOG=info,zipnet_node=info \
~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-2.log
" C-m
tmux new-session -d -s zipnet-s3 -n server-3
tmux send-keys -t zipnet-s3:server-3 "
ZIPNET_INSTANCE=demo.tdx \
ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
ZIPNET_SECRET=server-3-seed \
ZIPNET_BOOTSTRAP=$BOOTSTRAP \
ZIPNET_MIN_PARTICIPANTS=1 \
ZIPNET_ROUND_PERIOD=3s \
ZIPNET_ROUND_DEADLINE=15s \
RUST_LOG=info,zipnet_node=info \
~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-3.log
" C-m
Within 15–30 s, one of the three servers should log
committee: opening round at index I_1. That one is the current
Raft leader; the other two are followers. Which server wins the
election is not deterministic — do not special-case the first
server as “always the leader”.
Confirm the committee is healthy:
# on the host
grep -E 'zipnet up|leader|round' ~/zipnet/logs/server-*.log | tail -20
Step 6 — Start the aggregator
The aggregator is the only non-TDX process. It bootstraps against any committee server and must not be given the committee secret.
# on the host
tmux new-session -d -s zipnet-agg -n aggregator
tmux send-keys -t zipnet-agg:aggregator "
ZIPNET_INSTANCE=demo.tdx \
ZIPNET_SECRET=aggregator-seed \
ZIPNET_BOOTSTRAP=$BOOTSTRAP \
ZIPNET_FOLD_DEADLINE=2s \
RUST_LOG=info,zipnet_node=info \
~/zipnet/aggregator/zipnet-aggregator 2>&1 | tee ~/zipnet/logs/aggregator.log
" C-m
A healthy aggregator settles quickly and logs
aggregator booting; waiting for collections to come online
within a few seconds.
Step 7 — Start the reference client
# on the host
tmux new-session -d -s zipnet-c1 -n client-1
tmux send-keys -t zipnet-c1:client-1 "
ZIPNET_INSTANCE=demo.tdx \
ZIPNET_BOOTSTRAP=$BOOTSTRAP \
ZIPNET_MESSAGE='hello from ubuntu@tdx-host' \
ZIPNET_CADENCE=1 \
RUST_LOG=info,zipnet_node=info \
~/zipnet/client/zipnet-client-run-qemu.sh 2>&1 | tee ~/zipnet/logs/client-1.log
" C-m
Within one ZIPNET_ROUND_PERIOD (3s here) after the aggregator
bonds, the Raft leader should print:
INFO zipnet_node::committee: committee: opening round at index I_1
INFO zipnet_node::roles::server: submitted partial unblind at I_2
INFO zipnet_node::committee: committee: round finalized round=r1 participants=1
Step 8 — Verify end-to-end
From the host, tail all four log streams at once:
# on the host
tail -F ~/zipnet/logs/server-*.log ~/zipnet/logs/aggregator.log ~/zipnet/logs/client-1.log
You are looking for:
| Signal | Where | Meaning |
|---|---|---|
zipnet up: network=<universe> instance=demo.tdx | every role | Universe join and instance binding succeeded. |
mosaik_groups_leader_is_local = 1 on exactly one server (Prometheus or log line) | server logs | Committee has a single Raft leader. |
aggregator: forwarded aggregate to committee round=rN participants=1 | aggregator | Client envelopes reached the aggregator and were folded. |
committee: round finalized round=rN participants=1 | whichever server is leader | End-to-end round closed; broadcast published into the Broadcasts collection. |
Once you see round finalized with a non-zero participants
count, the topology is working.
Cleanup
# on the host
for s in zipnet-s1 zipnet-s2 zipnet-s3 zipnet-agg zipnet-c1; do
tmux kill-session -t $s 2>/dev/null || true
done
Each TDX guest emits a departure announcement over gossip on
SIGTERM and Raft tolerates a majority remaining; kill-session
sends SIGTERM to the foreground QEMU process, which in turn
signals the guest.
If a guest is wedged, pkill -f zipnet-server-run-qemu.sh is safe
— all in-memory state is disposable in v1.
What to change for a real deployment
This example collapses a three-node committee onto one host to keep the runbook short. To roll the same shape into production:
- Replace
ubuntu@tdx-hostwith three separate TDX hostsubuntu@tdx-1,ubuntu@tdx-2,ubuntu@tdx-3run by three independent operators (or at minimum, with three independent blast radii). Geographic separation is the point. - Run the aggregator on a fourth, non-TDX but well-connected host. Clients will often use it as a bootstrap; pick something with a stable address.
- Swap
tmuxfor systemd unit files — one per role — so crash recovery is automatic. See Running a committee server for the full production env matrix. - Bump
ZIPNET_MIN_PARTICIPANTSto at least2. A single client produces no anonymity. - Publish the instance name, universe
NetworkId, and the two MR_TDs ($SERVER_MRTD,$CLIENT_MRTD) to your users through release notes or a signed announcement. That is the entire onboarding handoff; see What you need from the operator for the matching reader side.
See also
- Quickstart — stand up an instance — the conceptual walk-through this page makes concrete.
- Running a committee server — every env var and metric for the server role.
- Running the aggregator — capacity planning and the single-aggregator caveat.
- Running a client — the reference client you ship to publishers.
- Rotations and upgrades — rolling a new MR_TD, rotating the committee secret, retiring an instance.
- Monitoring and alerts — what to watch once the topology above is in production.
Running a committee server
audience: operators
A committee server joins the Raft group that orchestrates the
instance’s rounds, holds one of the X25519 keys used to unblind the
broadcast vector, and publishes its public bundle into the replicated
ServerRegistry. In production it runs inside a TDX guest built from
the mosaik image builder; see the
Quickstart TDX section.
One-shot command
ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_COMMITTEE_SECRET="your-committee-secret" \
ZIPNET_SECRET="stable-node-seed" \
ZIPNET_MIN_PARTICIPANTS=2 \
ZIPNET_ROUND_PERIOD=3s \
ZIPNET_ROUND_DEADLINE=15s \
./zipnet-server --bootstrap <peer_id_of_another_server>
On a fresh universe with no existing seed peers, start the first
server without --bootstrap, grab the peer=… value printed at
startup, and pass it as --bootstrap to the remaining servers. Every
subsequent server, aggregator, or client can be bootstrapped off any
one of them. After the universe has settled, the mosaik discovery
layer finds peers on its own and the bootstrap hint is only needed
for cold starts.
Environment variables
The full list lives in Environment variables. The ones you will actually set in production:
| Variable | Meaning | Notes |
|---|---|---|
ZIPNET_INSTANCE | Instance name this server serves | Required. Short, stable, namespaced (e.g. acme.mainnet). Must match across the whole deployment. |
ZIPNET_UNIVERSE | Universe override | Optional. Leave unset to use zipnet::UNIVERSE (the shared mosaik universe). Set only for isolated federations. |
ZIPNET_COMMITTEE_SECRET | Shared committee admission secret | Treat as root credential. Identical on every committee member of this instance. |
ZIPNET_SECRET (or --secret) | Seed for this node’s stable PeerId | Unique per node. Anything not 64-hex is blake3-hashed. |
ZIPNET_BOOTSTRAP | Peer IDs to dial on startup | Helpful on cold universes; unnecessary once discovery has converged. |
ZIPNET_MIN_PARTICIPANTS | Minimum clients before the leader opens a round | Default 1. Set to at least 2 for meaningful anonymity. |
ZIPNET_ROUND_PERIOD | How often the leader attempts to open a round | e.g. 2s, 500ms. |
ZIPNET_ROUND_DEADLINE | Max time a round may stay open | e.g. 15s. The leader will force-advance a stuck round. |
ZIPNET_METRICS | Bind address for the Prometheus exporter | e.g. 0.0.0.0:9100. |
RUST_LOG | Log filter | Sane default: info,zipnet_node=info,mosaik=warn. |
Naming the instance
Instance names share a flat namespace per universe. Two operators
picking the same name collide in the same committee group and
neither deployment works — mosaik has no way to prevent or detect
this. Namespace aggressively: <org>.<purpose>.<env>, for example
acme.mixer.mainnet. If unsure, add a random suffix once and forget
about it (acme.mixer.mainnet.8f3c1a).
What a healthy startup looks like
INFO zipnet_server: spawning zipnet server server=a2095bed48
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=f5e28a69e6... role=3b37e5d575...
INFO zipnet_node::roles::server: server booting; waiting for collections + group
INFO zipnet_node::committee: committee: opening round at index I_1
INFO zipnet_node::roles::server: submitted partial unblind at I_2
INFO zipnet_node::committee: committee: round finalized round=r1 participants=N
A server that has been up for more than a minute and has not printed
round finalized yet is almost always waiting on one of:
- Client count below
ZIPNET_MIN_PARTICIPANTS. Check the aggregator’szipnet_client_registry_sizemetric. - Committee group has not elected a leader. Check
mosaik_groups_leader_is_localon each server; exactly one should be 1. - Bundle tickets not replicated. See Incident response — stuck rounds.
Resource profile
A single-slot round at the default RoundParams (64 slots × 256
bytes = 16 KiB broadcast vector) with 100 clients uses roughly:
- CPU: a burst of ~5 ms per round per client (pad derivation dominates).
- RAM: O(N) client bundles × 64 bytes + a ring buffer of recent aggregates.
- Network: inbound one aggregate envelope per round (+ Raft heartbeat traffic between servers), outbound one partial per round + Raft replication to followers.
Graceful shutdown
Send SIGTERM. The server emits a departure announcement over
gossip so peers learn within the next announce cycle (default 15 s)
that it is gone. Raft proceeds with the remaining quorum provided a
majority is still up.
Availability warning
In v1, any committee server going offline halts round progression because the state machine waits for one partial per server listed in the round’s roster. This is by design — the paper’s any-trust model prioritizes correctness over liveness. A v2 improvement is sketched in Roadmap to v2.
See also
- Running the aggregator — the other always-on node.
- Rotations and upgrades — rolling restarts, key rotation, adding/removing members.
- Monitoring and alerts — what to put on your dashboard.
- Incident response — when things go wrong.
Running the aggregator
audience: operators
The aggregator receives every client envelope for the live round,
XORs them into a single AggregateEnvelope, and forwards that to the
committee. It is untrusted for anonymity — compromising it only
affects liveness and round-membership accounting, never whether a
message can be linked to its sender. It is trusted for liveness:
if it stops, rounds stop.
In v1 there is exactly one aggregator per instance. It does not need to run inside a TDX guest (though you can if your ops story prefers uniformity).
One-shot command
ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_SECRET="stable-agg-seed" \
ZIPNET_FOLD_DEADLINE=2s \
./zipnet-aggregator --bootstrap <peer_id_of_a_committee_server>
Environment variables
| Variable | Meaning | Notes |
|---|---|---|
ZIPNET_INSTANCE | Instance name this aggregator serves | Required. Must match the committee’s. Typos show up as ConnectTimeout at round-open time. |
ZIPNET_UNIVERSE | Universe override | Optional; leave unset to use the shared universe. |
ZIPNET_SECRET (or --secret) | Seed for this aggregator’s stable PeerId | Strongly recommended: clients often use the aggregator as a discovery bootstrap. |
ZIPNET_BOOTSTRAP | Peer IDs to dial on startup | At least one committee server on a cold universe. |
ZIPNET_FOLD_DEADLINE | Time window to collect envelopes after a round opens | Default 2s. Raising it admits slower clients at the cost of latency. |
ZIPNET_METRICS | Prometheus bind address | Optional. |
The aggregator does not take ZIPNET_COMMITTEE_SECRET. It is
outside the committee’s trust boundary by design; do not give it
that secret even if your secret store makes it convenient.
What a healthy aggregator log looks like
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=4c210e8340... role=5ef6c4ada2...
INFO zipnet_node::roles::aggregator: aggregator booting; waiting for collections to come online
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r1 participants=3
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r2 participants=3
...
Capacity planning
Per round the aggregator:
- Receives N × B bytes from clients, where N is the number of active clients and B is the broadcast vector size (defaults to 16 KiB).
- Sends one aggregate of size B to every committee server.
If the committee is 5 servers and the instance has 1000 clients with default parameters:
- Inbound per round ≈ 1000 × 16 KiB = 16 MiB.
- Outbound per round ≈ 5 × 16 KiB = 80 KiB.
At a 2 s round cadence, inbound averages 64 Mbit/s. Provision accordingly.
Graceful shutdown
SIGTERM. Clients whose envelopes had not yet been folded into the
current round’s aggregate will drop to the floor; they retry on the
next round automatically.
Because the aggregator is a single point of failure for liveness in
v1, plan restarts against your monitoring: a round stall of
3 × ROUND_PERIOD + ROUND_DEADLINE triggers the stuck-round alert
documented in Monitoring.
What if I want two aggregators?
Not supported in v1. Running two on the same instance name gets you two processes competing for the submit stream, not load-balancing. If you need redundancy today, fail over with a warm-standby host behind a process supervisor — not two live aggregators. A multi-tier aggregator tree is sketched in Roadmap to v2 — Multi-tier aggregators.
See also
- Running a committee server
- Incident response — aggregator crash-loop, OOM, partition handling.
- Monitoring and alerts — aggregator-relevant metrics.
Running a client
audience: operators
The typical zipnet publisher is an external user running their own
TDX-attested agent — you don’t operate those. This page is about the
reference zipnet-client binary you ship to publishers (or run
yourself for a bundled wallet, a cover-traffic filler, or a
smoke-test participant).
A client generates an X25519 keypair, publishes its public bundle via gossip, and seals one envelope per round. In production every client runs inside a TDX guest whose MR_TD matches the value your committee pinned; see Quickstart TDX section.
One-shot command
ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_MESSAGE="payload-to-broadcast" \
./zipnet-client --bootstrap <peer_id_of_aggregator_or_server>
Omit ZIPNET_MESSAGE to run a cover-traffic client that participates
in every round with a zero payload. Cover traffic is the operator’s
tool for raising the effective anonymity set size when real
publishers are sparse.
Environment variables
| Variable | Meaning | Notes |
|---|---|---|
ZIPNET_INSTANCE | Instance name to bind to | Required. Same string the committee uses; typos show up as ConnectTimeout. |
ZIPNET_UNIVERSE | Universe override | Optional; leave unset to use the shared universe. |
ZIPNET_BOOTSTRAP | Peer IDs to dial on startup | Aggregator’s PeerId or any committee server’s. Needed only on cold networks. |
ZIPNET_MESSAGE | UTF-8 message to seal per round | Truncate yourself to fit slot_bytes − tag_len. Default slot width is 240 bytes of user payload. |
ZIPNET_CADENCE | Talk every Nth round | Default 1. Useful for dialing your own talk/cover ratio. |
ZIPNET_METRICS | Prometheus bind address | Optional. |
Building the TDX image you ship to publishers
Publishers to a TDX-gated instance need to run your client image (not their own ad-hoc build), because the committee will reject any client whose quote doesn’t match the pinned MR_TD. Build it the same way you build the server image — mosaik ships the builder:
// crates/zipnet-client/build.rs
fn main() {
mosaik::tee::tdx::build::alpine()
.with_default_memory_size("512M")
.build();
}
# crates/zipnet-client/Cargo.toml
[dependencies]
mosaik = { version = "0.3", features = ["tdx"] }
[build-dependencies]
mosaik = { version = "0.3", features = ["tdx-builder-alpine"] }
Alpine is the usual choice for clients — ~5 MB versus Ubuntu’s
~25 MB — unless your agent has a specific glibc dependency. After
cargo build --release the artifacts land under
target/release/tdx-artifacts/zipnet-client/alpine/:
| Artifact | What it’s for |
|---|---|
zipnet-client-run-qemu.sh | Self-extracting launcher publishers invoke on a TDX host. |
zipnet-client-mrtd.hex | The 48-byte measurement. You pin this in the committee and publish it to readers. |
zipnet-client-vmlinuz | Raw kernel, for repackaging. |
zipnet-client-initramfs.cpio.gz | Raw initramfs. |
zipnet-client-ovmf.fd | Raw OVMF firmware. |
Publish zipnet-client-mrtd.hex alongside your release notes. It
goes into the committee’s Tdx::require_mrtd(...) configuration and
into readers’ verification code. See
Rotations and upgrades
for rolling a new MR_TD without downtime.
What a healthy client log looks like
INFO zipnet_client: spawning zipnet client client=550fda1ffa
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=c2e9aeee0e... role=a8b7ed5911...
INFO zipnet_node::roles::client: client booting; waiting for rosters
After boot, every sealed envelope is a DEBUG event. Raise
RUST_LOG to debug,zipnet_node=debug to see them.
Why a client’s envelope might get dropped
- The client bundle hasn’t replicated yet. The first few rounds
after a client connects may not include it in
ClientRegistry. Wait forzipnet_client_registeredto flip to 1 before relying on anonymity guarantees. - Slot collision with another client. v1’s slot assignment is a deterministic hash — two clients occasionally pick the same slot and XOR their messages into garbage. Neither falsification tag verifies, the committee still publishes the broadcast, the messages are lost, the clients retry next round. A 4x-oversized scheduling vector in v2 makes this rare.
- Message is longer than
slot_bytes − tag_len. The client exits withMessageTooLong. Shorten, or raiseslot_bytesat the instance level (which retires the instance — see Rotations and upgrades).
Identity lifetime
In the mock path (TDX disabled), each process run generates a fresh X25519 identity — run-to-run unlinkability is free. In the TDX path, the identity lives in sealed storage inside the enclave so a restart preserves it; useful for reputation systems, but means the same enclave is recognizable across runs. Design accordingly when you pick a cover-traffic cadence.
See also
- Running a committee server
- Rotations and upgrades — rebuilding the client image and rolling a new MR_TD.
- Security posture checklist — client-host hygiene, TDX expectations.
Rotations and upgrades
audience: operators
Every routine change in a running instance falls into one of these procedures. Follow them verbatim; the consensus and crypto are unforgiving about accidental divergence.
Rolling a committee server (restart, same identity)
Safe any time. Minority-restart is handled by Raft automatically.
- Stop the target server with
SIGTERM. Wait for graceful exit (under 5 s). - Replace the binary / restart the container / whatever triggered the rollout.
- Start the server with the same
ZIPNET_INSTANCE,ZIPNET_SECRET, andZIPNET_COMMITTEE_SECRETas before. - Observe
mosaik_groups_leader_is_localon the remaining servers — election should settle within a few seconds. - Once the restarted server’s log shows
round finalized, move to the next one.
Do not restart a majority of the committee simultaneously — that drops quorum and halts round progression until a majority is back up.
Adding a committee server
- Provision the new node. Assign it a fresh
ZIPNET_SECRETseed. - Distribute the same
ZIPNET_INSTANCEandZIPNET_COMMITTEE_SECRETto it. - Start it with
--bootstrap <peer_id_of_any_existing_server>. - Wait for the new server’s log to print
round finalized— it has caught up. - Update your operational runbook, monitoring targets, and audit log to reflect the added node.
The ServerRegistry collection automatically reflects the new
member within one round. Clients start including the new server in
their pad derivation from the next OpenRound the leader issues.
Removing a committee server
- Announce the removal at least one gossip cycle ahead (default 15 s) so catalog entries expire cleanly.
SIGTERMthe target node.- Verify the remaining servers still form a majority and continue
to finalize rounds (
round finalizedevents in the logs).
Security warning
A removed server retains its DH secret. If that secret is not wiped, an adversary who later compromises the decommissioned machine can replay historic rounds and compute that server’s share of past pads. Combined with any other committee server’s DH secret compromise, this would break anonymity of past rounds. Wipe DH secrets on decommission.
Rotating a committee server’s long-term key
v1 does not have first-class key rotation. The procedure is “decommission + re-add”:
- Remove the old server (above).
- Add a new server with a fresh
ZIPNET_SECRET(above).
The committee’s GroupId does not change (it depends on the
instance name and shared ZIPNET_COMMITTEE_SECRET, not on
individual node identities), so the Raft group persists across the
swap. The ServerRegistry entry is updated automatically.
Rotating the committee secret
This is disruptive: changing ZIPNET_COMMITTEE_SECRET changes the
GroupId, so the old committee is abandoned. External publishers
compiled against the instance name still bond, but the committee
they find is new.
- Announce a maintenance window.
- Stop every client, aggregator, and committee server on this instance.
- Distribute the new
ZIPNET_COMMITTEE_SECRETto all committee members. - Start the committee first, then the aggregator, then the clients.
Rotating round parameters
RoundParams (num_slots, slot_bytes, tag_len) is folded into
the committee’s state-machine signature. Changing it is equivalent
to rotating the committee secret (above), and it is a breaking
change for any publisher that compiled the old parameters in —
meaning in practice you bump the instance.
See Retiring and replacing an instance below.
Dev note
Developers changing
RoundParamsin code must also bump the signature string inCommitteeMachine::signature()when appropriate — otherwise old and new nodes silently derive the sameGroupIdbut disagree on apply semantics. See The committee state machine.
Rebuilding a TDX image
Rebuilding the committee or client image produces a new MR_TD. The committee’s ticket validator is pinned to a specific MR_TD, so a rebuild requires coordinated rollout:
- Build the new image with
cargo build --release(the mosaik TDX builder runs inbuild.rs, producing a freshmrtd.hex). - Publish the new
mrtd.hexto your release-notes channel. - Decide whether the change is ABI-compatible with the current
committee’s expectations:
- Patch-level image change (kernel patch, initramfs tweak,
no wire-format or state-machine change): accept both MR_TDs
transiently by updating the committee’s
require_mrtdlist to include the new hash, roll the committee hosts one at a time to the new image, then drop the old MR_TD from the allow-list. - Breaking change (new state-machine signature, new wire
format, new
RoundParams): treat it as retiring the instance (below).
- Patch-level image change (kernel patch, initramfs tweak,
no wire-format or state-machine change): accept both MR_TDs
transiently by updating the committee’s
- Sign and publish the new MR_TD, along with the retirement window for the old one, so publishers can rebuild their own images in time.
Retiring and replacing an instance
Use this path whenever a cross-compatibility boundary moves
(RoundParams, CommitteeMachine::signature, wire format, breaking
MR_TD change). You have two idiomatic versioning stories:
- Version in the name. Stand up the new deployment under a new
instance name (
acme.mainnet.v2). Old and new run in parallel for the transition window; publishers re-pin and rebuild at their own pace; you tear down the old instance when traffic has drained. The cleanest story for external publishers; forces them to cut a release. - Lockstep release against a shared deployment crate. Keep the instance name stable, cut a new deployment-crate version pinning the new state-machine signature, and coordinate operator + publisher upgrades as a single release event. Avoids instance-ID churn at the cost of tighter release-cadence coupling.
Zipnet v1 does not mandate which you pick; see Designing coexisting systems on mosaik — Versioning under stable instance names for the full tradeoff.
Retirement itself is just stopping every server under the old
instance name. Publishers still trying to bond see ConnectTimeout;
they rebuild against the new name or the new deployment crate and
reconnect.
Upgrading the binary
Patch-level upgrades (no CommitteeMachine::signature change, no
RoundParams change, no wire format change, no MR_TD change if
TDX-gated) are safe to roll one node at a time following the restart
procedure.
Upgrades that change any of those four cross a compatibility boundary — treat them like retiring the instance.
Dev notes on where to look in source:
WIRE_VERSIONincrates/zipnet-proto/src/lib.rsCommitteeMachine::signatureincrates/zipnet-node/src/committee.rsRoundParams::default_v1incrates/zipnet-proto/src/params.rs
Any change to those requires a coordinated restart of the whole instance.
See also
- Running a committee server
- Incident response — what to do when a restart doesn’t bring the node back cleanly.
- Designing coexisting systems on mosaik — Versioning under stable instance names
Monitoring and alerts
audience: operators
Zipnet inherits mosaik’s Prometheus exporter. Enable it by setting
ZIPNET_METRICS=0.0.0.0:9100 (or a port of your choice) on every
node you want scraped. See Metrics reference
for the complete list; this page covers the metrics that actually
tell you whether an instance is healthy.
All zipnet-emitted metrics carry an instance="<name>" label set
from ZIPNET_INSTANCE. Scope your alert rules on that label so a
stuck preview.alpha doesn’t page the on-call for acme.mainnet.
The three questions you ask every shift
1. “Are rounds finalizing?”
The authoritative signal is new entries appearing in the
Broadcasts collection. Track the rate of round finalized log
events on committee servers (INFO level). A healthy instance
finalizes one round per ZIPNET_ROUND_PERIOD interval, plus or
minus ZIPNET_FOLD_DEADLINE.
Alert condition: no round finalized event on a leader server for
3 × ROUND_PERIOD + ROUND_DEADLINE.
2. “Is the committee healthy?”
- Exactly one committee server in this instance should report itself
as leader at any one time. If zero or two-plus, investigate (see
Incident response — split-brain).
The relevant metric is
mosaik_groups_leader_is_local{instance="…"}. - Bond count per server should equal
N − 1where N is the committee size. A dropped bond suggests a universe-level partition or an expired ticket. - Raft log position should advance in lockstep across servers. A persistent lag (> 5 indices) on one server indicates that node is falling behind.
3. “Are clients and their pubkeys reaching the committee?”
ClientRegistrysize ≈ number of clients you launched for this instance, give or take gossip cycles.- Per-round
participantscount inround finalizedevents ≈ the number of non-idle clients.
Alert condition: participants = 0 for two consecutive rounds while
you expected > 0.
Useful log filters
On committee servers:
journalctl -u zipnet-server@acme-mainnet -f \
--grep='round finalized|opening round|submitted partial|SubmitAggregate|rival group leader'
On the aggregator:
journalctl -u zipnet-aggregator@acme-mainnet -f \
--grep='forwarded aggregate|registering client'
On clients:
journalctl -u zipnet-client@acme-mainnet -f \
--grep='sealed envelope|registration'
(Adjust for your process supervisor.)
Baseline expectations at default parameters
| Condition | Committee server | Aggregator | Client |
|---|---|---|---|
| Steady-state CPU | < 5 % on a mid-range core | varies with client count | < 1 % |
| RAM | 50–200 MB | 100–500 MB | 20–50 MB |
| Bond count | committee_size − 1 | 0 (not a group member) | 0 |
| Gossip catalog size | total universe node count ± 2 | total universe node count ± 2 | total universe node count ± 2 |
| Inbound per round | N × B / committee_size (replication) | N × B | B / client |
| Outbound per round | B + heartbeats | committee_size × B | B |
N = clients, B = broadcast vector bytes (default 16 KiB).
Dev note
The gossip catalog includes peers from every service on the shared universe, not just zipnet. Your catalog size may be much larger than your committee size if the universe also hosts multisig signers, oracles, or other mosaik agents. Do not alert on absolute catalog size; alert on change in catalog size relative to a baseline.
Sensible alerts to configure
- Round stall. No new
Broadcastsentry for3 × ROUND_PERIOD + ROUND_DEADLINE. Page on-call: committee is stuck, aggregator is down, ormin_participantsis unmet. - Committee partition.
sum by (instance) (mosaik_groups_leader_is_local{instance="…"})is 0 or ≥ 2 for more than 1 minute. Page on-call. - TDX attestation approaching expiry. Less than 24 h to ticket
expon any bonded peer. Page TEE operator. - Bond drop.
mosaik_groups_bonds{peer=<known>,instance="…"}drops from 1 to 0 for more than 30 s and does not recover.
Multi-instance dashboards
Since multiple instances share the same universe and the same host
fleet, build dashboards with instance as a dimension from the
start:
- A top-level panel showing
rate(zipnet_round_finalized_total[1m])broken out byinstance. - A committee-health grid: rows are instances, columns are the
committee members, cells are
mosaik_groups_leader_is_local. - A per-instance heatmap of
participantsover time — sparse rounds are often the first hint of a sick publisher fleet.
A starter Grafana dashboard is not shipped in v1. The metrics list in Metrics reference is sufficient to build one. A community-maintained dashboard is tracked as a v2 follow-up.
See also
- Incident response — what to check when an alert fires.
- Metrics reference — the full label and metric list.
Incident response
audience: operators
This page is a runbook. It lists the failure modes we have actually observed in testing and the minimal steps that resolve each. Each section is scoped to a single instance — if multiple instances on the same universe are misbehaving at once, something is wrong at the universe level (relays, DHT, network) rather than in any one instance, and you should start with the “Discovery is slow” section.
Stuck rounds
Symptom: no round finalized log on any committee server in this
instance for more than 3 × ROUND_PERIOD + ROUND_DEADLINE.
Root-cause checklist, in order of likelihood:
-
Fewer active clients than
ZIPNET_MIN_PARTICIPANTS. The leader won’t open a round until this threshold is met.- Check:
zipnet_client_registry_size{instance="…"}on any committee server. - Fix: either start more clients (or cover-traffic filler) or
lower
ZIPNET_MIN_PARTICIPANTS(rolling restart of the committee — this is in the state machine’s signature derivation, so everyone needs the same value).
- Check:
-
Committee has no leader. Raft election has not settled (yet, or ever).
- Check:
mosaik_groups_leader_is_local{instance="…"} == 0on all members. - Fix: usually self-heals within
ELECTION_TIMEOUT + BOOTSTRAP_DELAY. If persistent, suspect clock skew or a full network partition.
- Check:
-
Client bundles have not replicated to the committee. Clients have connected but their bundles haven’t landed in
ClientRegistry— the aggregator hasn’t yet mirrored them in.- Check: aggregator log for
registering client bundle; this should fire for each new client. - Fix: ensure the aggregator is reachable from every client
(correct
ZIPNET_BOOTSTRAPor working universe discovery). Wait one gossip cycle (≈ 15 s).
- Check: aggregator log for
-
One or more server bundles missing from
ServerRegistry. A committee server failed to self-publish.- Check: query
ServerRegistrysize on each committee server; should equal committee size. - Fix: restart the offending server; it re-publishes on boot.
- Check: query
If a publisher reports Error::ConnectTimeout that traces back to
any of the root causes above, it is an operator-side issue
surfacing as a user-side error. The SDK cannot distinguish “my
instance name is wrong” from “the operator’s committee is stuck” —
that’s a deliberate tradeoff of the no-registry design.
Split-brain
Symptom: two or more committee servers in this instance report
mosaik_groups_leader_is_local == 1, or a server’s log shows
rival group leader detected.
v1 uses mosaik’s modified Raft which resolves rivals by term. The
system self-heals within one ELECTION_TIMEOUT. If it does not
self-heal:
- Check clock skew across committee members (
ntpdate -qon each). More than a few seconds of skew breaks Raft timing. - Check the network — split-brain persisting past self-heal is a partition.
- As a last resort,
SIGTERMthe minority faction. They’ll rejoin as followers.
Do not change
ZIPNET_COMMITTEE_SECRETmid-incident. It would force a fresh committee group and hide evidence of the split, not resolve it.
Committee quorum loss
Symptom: fewer than a majority of committee servers are reachable. Rounds cannot commit.
- Restore the failed nodes. They rejoin on startup.
- If restoration is impossible (hardware loss, etc.), a v1 deployment has no graceful recovery — retire the instance and stand up a fresh one under a new name (or bump the deployment crate version). See Rotations and upgrades — Retiring and replacing an instance.
Aggregator crash-loop
Symptom: aggregator exits or OOMs shortly after boot.
Most common cause in v1: too many concurrent clients pushing
envelopes larger than the internal buffer (buffer_size = 1024 per
mosaik default).
Fix: either lower client concurrency by splitting the publisher
fleet across multiple instances (each with its own
ZIPNET_INSTANCE), or tune the aggregator’s stream buffer when
calling
network.streams().consumer::<ClientEnvelope>().with_buffer_size(N)
— this requires a code change in zipnet-node (dev task).
TDX attestation expiry
Symptom: committee rejects a previously-good peer with
unauthorized; the peer re-bonds in a loop with the same outcome.
On the peer side, logs mention an expired quote.
Causes, in order of likelihood:
- Quote
expelapsed. Each TDX quote carries an expiration. The bonded peer needs a fresh quote.- Fix: restart the peer. On restart the TDX layer fetches a new quote from the hardware. If the peer still fails, check the TDX host’s attestation service reachability.
- Clock skew between the peer and the committee. The committee
rejects a quote whose
exphas already passed in its local clock.- Fix: NTP on both sides.
- MR_TD mismatch. The peer is running a different image than
the committee expects. Common after a committee rebuild the peer
hasn’t yet picked up.
- Fix: re-build the peer image from the current release, or see Rotations and upgrades — Rebuilding a TDX image for the transition plan.
Discovery is slow (universe-level)
Symptom: nodes log Could not bootstrap the routing table and take
minutes to find each other. Typically affects all instances on
the same universe simultaneously.
Usual cause: iroh’s pkarr / Mainline DHT bootstrap is struggling (common on fresh residential networks or a fresh universe). Workarounds:
- Pass an explicit
ZIPNET_BOOTSTRAP=<peer_id>on every non-bootstrap node. - Enable mDNS discovery (already on by default in this prototype). For LAN deployments this is often enough.
- Run a mosaik bootstrap node (see mosaik’s
examples/bootstrap.rs) with a well-known public address and seed it everywhere.
A dedicated bootstrap node is recommended for any production universe that hosts more than one zipnet instance.
When to escalate
- Unknown log messages containing
committedorrevertedoutside the expected Raft lifecycle. Broadcastscollection contains entries where the number of servers in the record does not match your configured committee size for this instance.- Any indication that two clients with the same
ClientIdcoexist (would mean someone forged a bundle — investigate as a security incident). - Publishers reporting
WrongUniverse— indicates an operator misconfiguration ofZIPNET_UNIVERSE, or a publisher using the wrongzipnet::UNIVERSEconstant.
See also
- Monitoring and alerts — the alerts that surface these conditions.
- Rotations and upgrades — controlled changes that avoid these incidents in the first place.
Accounting and audit
audience: operators
Anonymous broadcast looks, from the outside, uncomfortably like a
thing you cannot account for. Auditors will ask. This page tells
you what you can attest to, what you cannot, and how to produce
evidence for each. Everything here is scoped to a single zipnet
instance — multiple instances on the same universe are separately
audited against their own committee roster and Broadcasts
collection.
What the protocol is designed to guarantee
- Given at least one honest committee server, no party — not the operator, not the aggregator, not the remaining committee members, not an outside observer of the network — can determine which client authored which published broadcast.
- Given all parties operating the protocol honestly, every
broadcast in the
Broadcastslog is the XOR-sum of the messages of the clients listed in that round’sparticipantsfield, subject to slot collisions. - Committed broadcasts are signed-in-transit by every bonded pair and logically signed by the Raft leader at commit time. Replays are detectable.
What the protocol is not designed to guarantee
- Who an individual
ClientIdrefers to. A client’sClientIdis a hash of its X25519 public key, not a legal identity. You will need an out-of-band registration process if you want to tie aClientIdto a legal entity. - That a broadcast is well-formed. A malicious client can put garbage in its slot. The falsification tag protects honest clients from other clients corrupting their slot, but not from a client corrupting its own slot.
- Censorship-resistance. A malicious aggregator or a majority of malicious committee servers can delay or drop rounds. Anonymity still holds; availability does not.
What you can attest to
“Did this instance publish this broadcast on this date?”
Every entry in the instance’s Broadcasts collection carries:
round: RoundIdparticipants: Vec<ClientId>— snapshot of the active clients at round-open timeservers: Vec<ServerId>— committee members that contributed partialsbroadcast: Vec<u8>— the final XORed vector
Together with the Raft commit index, this is a point-in-time claim
signed (through the bond layer) by every committee server. Archive
the Broadcasts entries you care about, keyed by instance name —
there is no authoritative external registry.
“Who was running which node on this date?”
This is an organizational fact, not a cryptographic one. Maintain an external table per instance:
| Instance | PeerId | Legal entity | Role | Valid from | Valid to |
|---|---|---|---|---|---|
acme.mainnet | f5e28a… | Acme Corp | committee-server-1 | 2026-03-01 | present |
acme.mainnet | 4c210e… | Acme Corp | aggregator | 2026-03-01 | present |
acme.preview | a91742… | Acme Corp | committee-server-1 | 2026-04-02 | present |
Sign this table with your corporate root, version it, and include
it in your audit package. PeerId is stable when ZIPNET_SECRET
is stable; rotate only via a documented procedure (see
Rotations and upgrades).
“Was a specific server in the committee on this round?”
BroadcastRecord::servers lists every committee member whose
partial unblind was folded into the published broadcast. Combine
with your PeerId → legal entity table to produce a legal-readable
statement.
“Did this committee server operate honestly?”
You cannot prove this from the record alone — a malicious committee member can behave indistinguishably from an honest one, provided at least one other committee member is honest. (That’s the whole point of the any-trust model.) What you can attest to:
- The server was up and participating (its partial is folded in).
- The server’s key material was controlled by the claimed legal
entity (via the
PeerEntrysignature). - For TDX-gated instances, the server’s boot measurement matched the committee’s pinned MR_TD. Archive the quote alongside the instance deployment record (see below).
In regulatory settings where “operated honestly” must be proven positively, a TDX attestation is as close as the protocol gets — the quote cryptographically proves the code running inside the committee server matches a published image hash.
Archival recommendations
-
Archive
Broadcastscontinuously, per instance. A committee server’s in-memory copy is the source of truth in v1; if the majority of the committee goes offline at once, the log is gone. Mirror the log into durable storage at your cadence of choice. A minimal script: open aZipnet::bind(&network, instance)handle in read-only mode from a non-committee host, iterate entries newer than your checkpoint, append to a signed ledger, commit. -
Archive the
PeerIdtable, keyed by instance. Version it; keep change history. A SHA-256 of this table goes into your audit manifest. -
Archive the instance configuration. For each instance:
- Instance name.
ZIPNET_COMMITTEE_SECRET’s blake3 fingerprint (not the raw secret).RoundParams.ConsensusConfig.- Committee roster.
- Committee MR_TD (if TDX-gated).
-
Archive TDX attestation quotes. For TDX-gated instances, each committee server’s quote includes its MR_TD and RTMRs. Store them per instance, per deploy.
Evidence package for external audit
A minimal per-quarter package, per instance:
- Instance name and its universe
NetworkId. Broadcastslog excerpt for the quarter (signed by your corporate root).PeerId → legal entitytable for that instance (signed, version-pinned).- Instance configuration fingerprint: SHA-256 of
blake3(COMMITTEE_SECRET) || blake3(ROUND_PARAMS) || blake3(CONSENSUS_CONFIG) || instance_name. - Committee MR_TD (TDX-gated instances).
- List of committee membership changes, cross-referenced to git/CD deployment records.
- Incident log covering any stuck rounds, split-brain events, or membership changes in the period.
An auditor can re-derive ClientIds referenced in participants
from the corresponding signed PeerEntry tickets archived from
gossip — useful if they want to ask “was client X part of round Y”.
Multiple instances, shared universe
Because zipnet instances share a universe, an auditor who reads your raw gossip logs will see traffic that belongs to other instances — and possibly to other mosaik services entirely. Two consequences to call out in your audit narrative:
- Gossip-level traffic volume from your fleet is not a proxy for
your instance’s traffic. A committee server on
acme.mainnetroutinely forwards discovery messages on behalf of other instances and services on the same universe. - Peer-catalog size is likewise a universe-level quantity. Do not attempt to derive per-instance population from catalog counts.
For per-instance accounting, stick to the Broadcasts collection
and the ServerRegistry / ClientRegistry contents read through
Zipnet::bind(&network, instance).
Privacy and data retention
Published broadcasts are, by design, readable by anyone who can read
the Broadcasts collection. Treat them as public data. Archival
retention policy is a business decision; the protocol neither
enforces nor contradicts any specific retention period.
Signed PeerEntrys (carrying peers’ ClientBundles / ServerBundles)
are also public by design — they are gossiped to every universe
member. There is no way to revoke a signed entry retroactively.
Security warning
Do not publish
ZIPNET_COMMITTEE_SECRETor any committee server’s X25519 secret, historic or current. Disclosure of any committee server’s DH secret, combined with disclosure of any other committee server’s DH secret, breaks anonymity of every round in which both nodes participated.
See also
- Security posture checklist — what must be protected, per role.
- Rotations and upgrades — change procedures that your audit log must cross-reference.
Security posture checklist
audience: operators
Each item below is a pre-production checklist entry. Print it,
initial it, file it with the deploy record. Work through this
checklist per instance — an honest posture on acme.mainnet
does not protect preview.alpha if the two share a fault domain or
a secret store.
Instance identity and scope
-
ZIPNET_INSTANCEis set to a namespaced string (e.g.acme.mainnet) and documented in the release notes your publishers consume. No operator within the same universe uses the same string. -
ZIPNET_UNIVERSE, if set, points at a universe you control. The default (zipnet::UNIVERSE) is the shared world and is correct for most deployments. - The instance’s MR_TD (TDX-gated instances) is published alongside the instance name in a signed channel. Publishers verify against that hash.
Committee secret handling
-
ZIPNET_COMMITTEE_SECRETis stored only in a secret manager (vault, AWS Secrets Manager, HashiCorp Vault, k8sSecretresource). Never in a git repo, never in a plain environment file. - The secret is unique per instance. Do not reuse one
committee secret across
acme.mainnetandacme.previeweven though the operator is the same. - Rotation procedure is documented and rehearsed (see Rotations and upgrades).
- Access to read the secret is audited. A quarterly review of access logs is on the calendar.
Committee server node hygiene
- Each committee server runs in a separate fault domain (different cloud account, different region, different operator organization if possible). The whole point of any-trust is diversity.
- In production, every committee server runs inside a TDX guest
built by the mosaik image builder. The committee’s
require_mrtd(...)validator is set to the build’s measured MR_TD. See Rebuilding a TDX image for the rebuild cadence. -
ZIPNET_SECRETis unique per node and stored in the node’s own secret scope (not shared with any other node). - Committee servers listen only on the iroh port (default UDP ephemeral + relay) and the Prometheus metrics port. No other inbound exposure.
- Decommissioned committee servers have their disks wiped. DH secrets leaking from a decommissioned box are historically replayable.
Aggregator node hygiene
- The aggregator is not in the committee’s secret-possession
circle. It does not have access to
ZIPNET_COMMITTEE_SECRET. - Aggregator memory is not a secret store — aggregates are XOR-sums whose plaintext only the committee can recover. Still, hardening the aggregator is good practice: read-only filesystem, dropped capabilities, etc.
- If you operate one aggregator per instance, each is configured
with its own
ZIPNET_INSTANCEand its ownZIPNET_SECRET.
Client image hygiene (TDX-gated instances)
- The client image you ship to publishers is built reproducibly. The mosaik TDX builder is deterministic — commit your toolchain and feature-flag set alongside the release.
- The committee’s
Tdxvalidator lists the published client MR_TD inrequire_mrtd(...). Publishers running any other image are rejected at bond time. - TDX quote expiration is monitored; see Monitoring.
- Image rebuild cadence is documented. At minimum, rebuild whenever the upstream kernel or initramfs toolchain ships a security fix — a new MR_TD is cheap compared with unpatched firmware.
Client image hygiene (TDX disabled, dev/test only)
- Understood: without TDX, the client trusts the client host for DH key protection. Anyone with access to the client process can deanonymize that client’s own messages (not others’).
- Clients handling non-public messages wait for the
ClientRegistryto include their own entry and wait for at leastZIPNET_MIN_PARTICIPANTS − 1other clients to also be registered before relying on anonymity properties. - This posture is explicitly not used for production in TDX-gated instances.
Network hygiene
- Firewalls permit outbound UDP to iroh relays. If you run your own relay, ensure clients can reach it.
- NTP is configured on every node. Raft tolerates small skew; large skew causes election storms. TDX quote validation is also clock-sensitive.
- Prometheus metrics endpoints are NOT publicly exposed.
Archival / audit
- A job pulls the
Broadcastscollection to durable storage at the chosen cadence, keyed by instance name (see Accounting and audit). -
PeerId → legal entityregistry is version-controlled, signed, and scoped per instance.
Emergency contacts
- On-call rotation documented for each node, per instance.
- Break-glass procedure for committee-secret rotation documented, per instance.
- “Who can revoke a compromised bundle ticket” is specified — note that in v1 a ticket lives in gossip until the node is removed from the universe, so the answer is “the node’s operator, by stopping the node”.
Known-not-yet-protected footguns
- Metadata from iroh. The iroh layer leaks some metadata (relay preferences, coarse geography via relay choice). A global passive adversary observing traffic patterns across relays can narrow anonymity sets.
- Cross-instance traffic correlation. Instances share a
universe. A passive observer of gossip can often tell “this peer
is a member of instance X” from catalog membership, even without
seeing any
Broadcastscontent. Anonymity within a round is unaffected; anonymity of membership in an instance is not a property the protocol provides. - Client message length. The protocol encrypts the message but does not pad it to a uniform length. Unusually long messages are recognizable in the broadcast. Pad your payloads to the nearest slot boundary at the application layer if this matters for you.
- Participant set disclosure.
BroadcastRecord::participantslists everyClientIdwhose envelope was folded into the round. Knowing “client X was in this round” is not the same as knowing “client X wrote this message”, but it is visible and it leaks connection timing.
These are tracked in Roadmap to v2.
See also
Designing coexisting systems on mosaik
audience: contributors
Mosaik composes primitives — Stream, Group, Collection,
TicketValidator. It does not prescribe how a whole service — a
deployment with its own operator, its own ACL, its own lifecycle — is
shipped onto a network and made available to third-party agents. That
convention lives one layer above mosaik and has to be invented per
service family.
This page describes the convention zipnet uses, why it was picked, and what a contributor building the next service on mosaik (multisig signer, secure storage, attested oracle, …) should reuse. It is a mental model, not an API reference: the concrete instantiation is in Architecture.
The problem
A mosaik network is a universe where any number of services run concurrently. Each service:
- is operated by an identifiable organisation (or coalition) and has its own ACL
- ships as a bundle of internally-coupled primitives — usually a
committee
Group, one or more collections backed by that group, and one or more streams feeding it - must be addressable and discoverable by external agents who do not operate it
- co-exists with many other instances of itself (testnet, staging, per-tenant deployments) and with unrelated services on the same wire
The canonical shape zipnet itself was built for is an encrypted mempool — a bounded set of TEE-attested wallets publishing sealed transactions for an unbounded set of builders to read, ordered and unlinkable to sender. Other services built on this pattern (signers, storage, oracles) have the same structural properties.
Nothing about these requirements is in mosaik itself. The library will
happily let you stand up ten Groups and thirty Streams on one
Network; it says nothing about which of them constitute “one zipnet”
versus “one multisig”.
Two axes of choice
Every design in this space picks a point on two axes.
- Network topology. Does a deployment live on its own
NetworkId, or on a shared universe with peers of every other service? - Discovery. How does an agent go from “I want zipnet-acme” to bonded-and-consuming without hardcoded bootstraps or out-of-band config?
Four shapes fall out:
| Shape | Topology | When to pick |
|---|---|---|
| A. Service-per-network | One NetworkId per deployment; agents multiplex many Network handles | Strong isolation, per-service attestation scope, no cross-service state |
| B. Shared meta-network | One universe NetworkId; deployments are overlays of Groups/Streams | Many services per agent, cheap composition, narrow public surface required to tame noise |
| C. Derived sub-networks | ROOT.derive(service).derive(instance) hybrids | Isolation with structured discovery, still multi-network per agent |
| D. Service manifest | Orthogonal: a rendezvous record naming all deployment IDs | Composable with A/B/C; required for discoverable-without-out-of-band-config |
Zipnet picks B for topology, with optional derived private networks for high-volume internal plumbing, and compile-time instance-salt derivation for discovery — no on-network registry required. The rest of this page unpacks why and how.
Narrow public surface
The single most important discipline in this model is that a deployment exposes a small, named, finite set of primitives to the shared network. The ideal is one or two — a stream plus a collection, two streams, a state machine plus a collection, and so on. Everything else is private to the bundle and wired up by the deployment author, who is free to hardcode internal dependencies as aggressively as they like.
Zipnet’s outward surface decomposes cleanly into two functional roles,
even though it carries several declare! types:
- write-side:
ClientRegistrationStreamandClientToAggregator— ticket-gated, predicate-gated, used by external TEE clients to join a round and submit sealed envelopes. - read-side:
LiveRoundCell,Broadcasts, plus the two registries — read-only ambient round state that external agents need in order to seal envelopes and interpret finalized rounds.
An integrator’s mental model is “a way to write, a way to read”. They do not need to know the committee exists, how many aggregators there are, or how DH shuffles are scheduled. Internally the bundle looks like this:
shared network (public surface)
─────────────────────────────────────────────────────────────────
ClientRegistrationStream, ClientToAggregator ─┐
│
LiveRoundCell, Broadcasts, ClientRegistry, ◀─┤
ServerRegistry │
│
─────────────────────────────────────────────────
derived private network (optional) │ (private plumbing)
▼
Aggregator fan-in / DH-shuffle gossip Committee Group<CommitteeState>
Round-scheduler chatter AggregateToServers stream
BroadcastsStore (backs Broadcasts)
The committee Group stays on the shared network because the
public-read collections are backed by it and bridging collections
across networks is worse than the catalog noise. Only the
genuinely high-churn channels belong on a derived private network.
The three conventions
Three things make this pattern work. A contributor starting a new service should reproduce all three.
1. Instance-salt discipline
Every public ID in a deployment descends from one root:
INSTANCE = blake3("zipnet." + instance_name) // compile- or run-time
SUBMIT = INSTANCE.derive("submit") // StreamId
BROADCASTS = INSTANCE.derive("broadcasts") // StoreId
COMMITTEE = INSTANCE.derive("committee") // GroupKey material
...
The top-level instance salt is a flat-string hash: compile-time via
zipnet::instance_id!("acme.mainnet") (which expands to
mosaik::unique_id!("zipnet.acme.mainnet")) and run-time via
zipnet::instance_id("acme.mainnet") produce the same 32 bytes.
Sub-IDs within the instance chain off it with .derive() for
structural clarity.
An agent that knows instance_name can reconstruct every public ID
from a shared declare! module. The consumer-side API is:
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let receipt = zipnet.publish(b"hello").await?;
let mut log = zipnet.subscribe().await?;
Zipnet::bind is a thin constructor that derives the instance-local
IDs and returns a handle wired to them. Raw
StreamId/StoreId/GroupId values are never exposed across the
crate boundary.
2. A Deployment-shaped convention
Authors should declare a deployment’s public surface once, in one
place, so consumers can bind without reassembling ID derivations by
hand. Whether this is a literal declare::deployment! macro or a
hand-written impl Deployment is ergonomics; the constraint is that
the public surface is a declared, named, finite set of primitives —
not “whatever the bundle happens to put on the network today”.
Every deployment crate should export:
- the public
declare::stream!/declare::collection!types for its surface, colocated in a single protocol module - a
bind(&Network, instance_name) -> TypedHandlesfunction - the intended
TicketValidatorcomposition for each public primitive
A service that exposes eight unrelated collections has probably not thought hard enough about its interface.
3. A naming convention, not a registry
Derivation from (service, instance_name) is enough for a consumer
who knows the instance name to bond to the deployment: both sides
compute the same GroupId, StreamIds, and StoreIds, and mosaik’s
discovery layer does the rest. No on-network advertisement is
required — the service does not need to advertise its own existence.
A consumer typically pins the instance as a compile-time constant:
const ACME_ZIPNET: UniqueId = zipnet::instance_id!("acme.mainnet");
let zipnet = Zipnet::bind_by_id(&network, ACME_ZIPNET).await?;
…or by string when convenient:
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
The operator’s complete public contract is three items: the universe
NetworkId, the instance name, and (if the instance is TDX-gated)
the MR_TD of the committee image. These travel via release notes,
docs, or direct handoff. Nothing about the binding path touches a
registry.
A directory may exist — a shared collection listing known instances — but it is a devops convenience for humans enumerating deployments, not part of the consumer binding path. Build it if you need it; nothing about the pattern requires it.
What this buys you
- A third-party agent’s mental model collapses to: “one
Network, many services, each bound by instance name.” - Multiple instances of the same service coexist trivially — each derives disjoint IDs from its salt.
- ACL is per-instance, enforced at the edge via
require_ticketon the public primitives; no second ACL layer is needed inside the bundle. - Internal plumbing can move to a derived private network without changing the public surface.
- Private-side schema changes (
StateMachine::signature()bumps) are absorbed behind the instance identity, as long as operators and consumers cut releases against the same version of the deployment crate.
Where the pattern strains
Three things are not free under this convention. Every new service author should be honest about them up front.
Cross-service atomicity is out of scope
There is no way to execute “mix a message AND rotate a multisig
signer” in one consensus transaction. They are different Groups
with different GroupIds, possibly with disjoint membership. If a
service genuinely needs that — rare, but real for some
coordination-heavy cases — the right answer is a fourth primitive
that is itself a deployment providing atomic composition across
services, not an ad-hoc cross-group protocol.
Versioning under stable instance names
If StateMachine::signature() changes, GroupId changes, and
consumers compiled against the old code silently split-brain. Under
multi-instance, the expectation is that “zipnet-acme” is an
operator-level identity that outlives schema changes. Two ways to
reconcile:
- Let the instance salt carry a version (
zipnet-acme-v2), and treat version bumps as retiring the old instance. Clean, but forces consumers to re-pin and release a new build on every upgrade. - Keep the instance name stable across versions and require operators and consumers to cut releases in lockstep against a shared deployment crate version. Avoids churn in instance IDs, at the cost of tighter coupling between operator and consumer release cadences.
Zipnet v1 does not need to resolve this. V2 must.
Noisy neighbours on the shared network
A shared NetworkId means every service’s peers appear in every
agent’s catalog. Discovery gossip, DHT slots, and bond maintenance
scale with the universe, not with the services an agent cares about.
The escape hatch is the derived private network for internal chatter;
the residual cost — peer-catalog size and /mosaik/announce volume —
is paid by everyone. If a service’s traffic would dominate the
shared network (high-frequency metric streams, bulk replication) it
belongs behind its own NetworkId, not on the shared one. Shape A
is the correct call when the narrow-interface argument no longer
outweighs the noise argument.
Checklist for a new service
When adding a service to a shared mosaik universe, use this list:
- Identify the one or two public primitives. If you cannot, the interface is not yet designed.
- Pick a service root:
unique_id!("your-service"). - Define instance-salt conventions: what
instance_namemeans, who picks it, whether it carries a version. - Write a
bind(&Network, instance) -> TypedHandlesthat every consumer uses. Never export rawStreamId/StoreId/GroupIdvalues across the crate boundary. - Decide which internal channels, if any, move to a derived private
Network. Default: only the high-churn ones. - Specify
TicketValidatorcomposition on the public primitives. ACL lives here. - Document your instance-name convention in release notes or docs. Consumers compile it in; you are on the hook for keeping the name stable and the code release version-matched.
- Call out your versioning story before shipping. If you cannot
answer “what happens when
StateMachine::signature()bumps?”, you will regret it.
Cross-references
- Architecture — the concrete instantiation of this pattern for zipnet v1.
- Mosaik integration notes — gotchas and idioms specific to the primitives referenced here.
- Roadmap to v2 — where versioning-under-stable-names and cross-service composition work live.
Architecture
audience: contributors
This chapter is the concrete instantiation of the pattern described in Designing coexisting systems on mosaik for zipnet v1. It maps the paper’s three-part architecture (§2) onto mosaik primitives and identifies which of those primitives form the public surface on the shared universe versus the private plumbing that may live on a derived sub-network.
The reader is assumed to have read the ZIPNet paper, the mosaik book, and design-intro.
Deployment model recap
Zipnet runs as one service among many on the shared mosaik universe
zipnet::UNIVERSE = unique_id!("mosaik.universe"). A deployment is
a single zipnet instance: one committee, one ACL, one set of round
parameters, one operator. Many instances coexist on the universe.
An instance is identified by a short operator-chosen name
(acme.mainnet). Every public id in the instance descends from the
instance salt:
INSTANCE = blake3("zipnet." + instance_name) // root UniqueId
COMMITTEE = INSTANCE.derive("committee") // Group<M> key material
SUBMIT = INSTANCE.derive("submit") // ClientToAggregator StreamId
REGISTER = INSTANCE.derive("register") // ClientRegistrationStream StreamId
BROADCASTS = INSTANCE.derive("broadcasts") // Vec<BroadcastRecord> StoreId
LIVE = INSTANCE.derive("live-round") // Cell<LiveRound> StoreId
CLIENT_REG = INSTANCE.derive("client-registry") // Map StoreId
SERVER_REG = INSTANCE.derive("server-registry") // Map StoreId
Consumers recompute the same derivations from the same name; no on-wire registry is involved. See design-intro — Instance-salt discipline.
Public surface (what lives on UNIVERSE)
The instance’s outward-facing primitives decompose into two functional roles:
- write-side —
ClientRegistrationStream+ClientToAggregator. Ticket-gated, consumed by the aggregator. External TEE clients use these to join a round and submit sealed envelopes. - read-side —
LiveRoundCell+Broadcasts+ClientRegistry+ServerRegistry. Read-only ambient round state every external agent needs in order to seal envelopes and interpret finalized rounds.
Integrators bind via the facade:
let network = Arc::new(Network::new(zipnet::UNIVERSE).await?);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let receipt = zipnet.publish(b"hello").await?;
let mut log = zipnet.subscribe().await?;
The facade hides StreamId / StoreId / GroupId entirely; they
never cross the zipnet crate boundary.
Internal plumbing (optional derived private network)
Everything that is not part of the advertised surface is deployment-
internal. In v1 it all runs on UNIVERSE alongside the public surface;
this is the simplest place to start. A future deployment topology may
move the high-churn channels onto a derived private Network keyed
off INSTANCE.derive("private"):
AggregateToServers— aggregator → committee fan-out- any footprint-scheduling gossip
- round-scheduler chatter
The committee Group<CommitteeMachine> itself stays on UNIVERSE
because LiveRoundCell / Broadcasts / the two registries are
backed by it; bridging collections across networks is worse than the
extra catalog noise. See
design-intro — Narrow public surface.
Data flow
shared universe (public surface)
+--------+ ClientToAggregator +-------------+ AggregateToServers +-------------+
| Client | (stream) | Aggregator | (stream) [*] | Committee |
| TEE | --------------------> | role | -------------------> | Group<M> |
+--------+ +-------------+ +-------------+
| | |
| ClientRegistrationStream | |
+----------------------------------->| |
| |
+-------------------+---------------------+--------------+
| |
ClientRegistry (Map<ClientId, ClientBundle>) ServerRegistry (Map<ServerId, ServerBundle>)
| |
+-------------------------+------------------------------+
|
LiveRoundCell (Cell<LiveRound>)
|
Broadcasts (Vec<BroadcastRecord>)
[*] may migrate to a derived private network in a future topology.
All four collections are declare::collection!-declared with intent-
addressed StoreIds. The three streams are declare::stream!-declared
the same way. In v1 every derived id salt is a literal string; a
forthcoming Deployment-shaped convention (see
design-intro §The three conventions)
will replace the literal strings with chained .derive() calls off
INSTANCE.
Pipeline per round
t₀ t₁ t₂ t₃
| | | |
leader: ──── OpenRound ─── committed ─── LiveRoundCell mirrored ─── Broadcasts appended
│ (to followers) (on finalize)
▼
clients: read LiveRoundCell, seal envelope, send on ClientToAggregator
│
┌─────────────────────────────────────┘
▼
aggregator: fold envelopes until fold_deadline, send AggregateEnvelope
│
┌─────────────────────────────────────┘
▼
any committee server: receive, group.execute(SubmitAggregate)
│
▼
every committee server: see committed aggregate, compute its partial,
group.execute(SubmitPartial)
│
▼
state machine: all N_S partials gathered → finalize() → apply() pushes
BroadcastRecord
│
▼
apply-watcher on each server: mirror to LiveRoundCell / Broadcasts
Round latency is dominated by fold_deadline + one Raft commit round
trip per SubmitAggregate and one per SubmitPartial.
Participant roles
Clients
Implemented in zipnet_node::roles::client. Each client is an
Arc<Network> bonded to UNIVERSE, tagged zipnet.client, carrying a
zipnet.bundle.client ticket on its PeerEntry. Event loop:
loop {
live.when().updated().await;
let header = live.get();
if header.round == last { continue; }
if !header.clients.contains(&self.id) { retry registration; continue; }
let bundles = servers.get_all_in(header.servers);
let sealed = zipnet_core::client::seal(
self.id, &self.dh, msg, header.round, &bundles, params,
)?;
envelopes.send(sealed.envelope).await?;
}
Aggregator
Implemented in zipnet_node::roles::aggregator. ClientRegistry
writer. ClientToAggregator consumer. AggregateToServers producer.
Does not join the committee group.
loop {
live.when().updated().await;
let header = live.get();
let mut fold = RoundFold::new(header.round, params);
let close = tokio::time::sleep(fold_deadline);
loop {
tokio::select! {
_ = &mut close => break,
Some(env) = envelopes.next() => {
if env.round != header.round
|| !header.clients.contains(&env.client) {
continue;
}
fold.absorb(&env)?;
}
}
}
if let Ok(agg) = fold.finish() {
aggregates.send(agg).await?;
}
}
Committee servers
Implemented in zipnet_node::roles::server. Joins
Group<CommitteeMachine> as a Writer of ServerRegistry,
LiveRoundCell, and Broadcasts; reads ClientRegistry. Single
tokio::select! over three sources:
group.when().committed().advanced()— drives the apply-watcher.AggregateToServers::consumer— feeds inbound aggregates viaexecute(SubmitAggregate).- A periodic tick — leader-only round driver that opens new rounds
via
execute(OpenRound).
Why a dedicated Group<CommitteeMachine> and not just collections
The collections are each backed by their own internal Raft group. In
principle all round orchestration could be pushed into a bespoke
collection. We use a dedicated StateMachine because:
- Round orchestration needs domain transitions (Open → Aggregate → Partials → Finalize). These are hostile to Map / Vec / Cell CAS operations.
- Apply-time validation (e.g. rejecting aggregates that name non-
roster clients) reads more clearly in
apply(Command)than spread across collection CAS sequences. signature()is a clean place to pin wire / parameter version so incompatible nodes never form the same group.
The collections still pull their weight: they are the public-facing state external agents read without joining the committee group.
Identity universe
All IDs are 32-byte blake3 digests, via mosaik’s UniqueId. The
aliases used in v1:
| Alias | Derivation | Scope |
|---|---|---|
NetworkId | zipnet::UNIVERSE = unique_id!("mosaik.universe") | shared universe |
INSTANCE | blake3("zipnet." + instance_name) | one per deployment |
GroupId | mosaik-derived from GroupKey(INSTANCE.derive("committee")) + ConsensusConfig + signature() + validators | one per deployment’s committee |
StreamId / StoreId | INSTANCE.derive("submit"), INSTANCE.derive("broadcasts"), etc. in the target layout | one per public primitive |
ClientId | blake3_keyed("zipnet:client:id-v1", dh_pub) | stable across runs iff dh_pub is persisted |
ServerId | blake3_keyed("zipnet:server:id-v1", dh_pub) | same |
PeerId | iroh’s ed25519 public key | one per running Network |
ClientId / ServerId are not iroh PeerIds. They’re stable
across restarts iff the X25519 secret is persisted. In v1 (mock TEE
default) every client run generates a fresh identity; in the TDX
path the secret is sealed and ClientId becomes a long-lived
pseudonym.
Current-state caveat: ZIPNET_SHARD
The v1 binaries (zipnet-server, zipnet-aggregator,
zipnet-client) still take a ZIPNET_SHARD flag and derive a fresh
NetworkId from unique_id!("zipnet.v1").derive(shard). This
predates the UNIVERSE + instance-salt design and will be retired as
the binaries migrate to Zipnet::bind on UNIVERSE. Treat it as a
pre-migration artifact; new code should not replicate the pattern.
The e2e integration test exercises this path today.
Boundary between zipnet-proto / zipnet-core / zipnet-node
zipnet-proto— wire types, crypto primitives, XOR. No mosaik types, no async, no I/O. Anything that could be reused by an alternative transport lives here.zipnet-core— Algorithm 1/2/3 as pure functions. Depends on proto; no async, no I/O. The pure-DC-net round-trip test lives here.zipnet-node— mosaik integration. OwnsCommitteeMachine, alldeclare!items, all role loops. Everything async, everything I/O.zipnet— SDK facade. Wrapszipnet-nodebehindZipnet::bind(&network, "instance_name"); hides mosaik types from consumers.
See Crate map for the full workspace layout and design-intro — Narrow public surface for the rationale behind the facade boundary.
Cross-references
- Design intro — the generalised pattern this page instantiates.
- Committee state machine — commands,
queries,
signature()versioning. - Mosaik integration notes — the specific 0.3.17 footguns this architecture bumps into.
- Threat model — anonymity and integrity claims anchored to the state-machine guarantees above.
Crate map
audience: contributors
Workspace at /Users/karim/dev/flashbots/zipnet/. Edition 2024, MSRV
1.93. Mosaik pinned to =0.3.17 (see CLAUDE.md
for rationale).
zipnet-proto (pure: no mosaik, no tokio, no I/O)
▲
│
zipnet-core (pure: no mosaik, no tokio, no I/O)
▲
│
zipnet-node ── mosaik 0.3.17 ── iroh 0.97 (QUIC)
▲ ▲
│ └──────────────────────────┐
│ │
zipnet (SDK facade) ├── zipnet-client
├── zipnet-aggregator
└── zipnet-server
The split between -proto, -core, and -node is load-bearing,
not cosmetic. Anything that touches tokio, mosaik, or I/O must
live in -node (or higher). Anything that could be reused by an
alternative transport lives in -proto / -core. If you find
yourself reaching for tokio::spawn or mosaik:: inside -proto or
-core, you are in the wrong crate.
zipnet-proto
Pure wire types and crypto primitives. No mosaik, no async.
| Module | Role |
|---|---|
wire | ClientEnvelope, AggregateEnvelope, PartialUnblind, BroadcastRecord, ClientId, ServerId, RoundId |
crypto | HKDF-SHA256 salt composition, AES-128-CTR pad generator, blake3 falsification tag |
keys | DhSecret (X25519 StaticSecret), ClientKeyPair, ServerKeyPair, public ClientBundle / ServerBundle |
params | RoundParams (broadcast shape) |
xor | xor_into, xor_many_into over equal-length buffers |
WIRE_VERSION is bumped any time a wire or params shape changes.
CommitteeMachine::signature() in zipnet-node mixes this in so
nodes with different wire versions will never form a group.
zipnet-core
Paper’s algorithms as pure functions over -proto types. No async.
| Module | Role |
|---|---|
client::seal | Algorithm 1 — TEE-side sealing of one envelope |
aggregator::RoundFold | Algorithm 2 — stateful XOR fold of envelopes for one round |
server::partial_unblind | Algorithm 3 — per-server partial computation |
server::finalize | Committee combine — aggregate + partials → broadcast |
slot | Deterministic slot assignment + slot layout helpers |
The full round trip is exercised by
server::tests::e2e_two_servers_three_clients, which constructs a
3-server / 4-client setup (2 talkers + 2 cover) and asserts that the
final BroadcastRecord contains each talker’s plaintext at the
expected slot with a valid falsification tag. No transport is
involved — this is the pure-algebra proof.
zipnet-node
The only non-SDK crate that imports mosaik. Hosts the
declare! items, the committee state machine, and the role event
loops.
| Module | Role |
|---|---|
protocol | declare::stream! + declare::collection! items, tag constants, ticket class constants |
committee | CommitteeMachine: StateMachine, Command, Query, QueryResult, LiveRound, CommitteeConfig |
tickets | BundleValidator<K>: TicketValidator for client / server bundle tickets |
roles::common | NetworkBoot helper that wraps iroh secret, tags, tickets, and mDNS setup |
roles::client | client event loop |
roles::aggregator | aggregator event loop |
roles::server | committee server event loop (single tokio::select! over three event sources) |
The role modules are reusable as a library — the three binaries are
thin CLI wrappers around them. Test code in
crates/zipnet-node/tests/e2e.rs reuses the same primitives but
inlines the server loop so it can inject a pre-built Arc<Network>
and cross-sync_with all peers before anything starts (same pattern
as mosaik’s examples/orderbook).
protocol.rs today vs target
protocol.rs currently declares its StreamId / StoreId literals
as flat strings ("zipnet.stream.client-to-aggregator", etc.). The
target per design-intro
is INSTANCE.derive("submit") / .derive("broadcasts") / … chained
off the per-deployment instance salt so multiple instances can
coexist on one mosaik universe without colliding. The migration
removes the ZIPNET_SALT.derive(shard) NetworkId scoping in
favour of the shared zipnet::UNIVERSE constant.
zipnet (SDK facade)
Public surface for consumers. Wraps zipnet-node and hides all
mosaik types (StreamId, StoreId, GroupId) from callers.
| Module | Role |
|---|---|
environments | UNIVERSE constant, instance_id(&str) fn, instance_id! macro |
client | Zipnet::bind, Zipnet::bind_by_id, publish, subscribe, shutdown |
error | Error { WrongUniverse, ConnectTimeout, Attestation, Shutdown, Protocol } |
types | Receipt, Round, Outcome, Message |
driver | internal task that plumbs publishes onto ClientToAggregator and broadcasts back |
Re-exports from mosaik that the SDK intentionally surfaces:
UniqueId, NetworkId, Tag, unique_id!. Nothing else is
re-exported — callers that need raw mosaik types have fallen off the
supported path and should drop to zipnet-node directly.
zipnet::instance_id(name) and zipnet::instance_id!("name") must
produce byte-identical outputs; the macro lowers to
mosaik::unique_id!(concat!("zipnet.", $name)) and the runtime fn is
UniqueId::from("zipnet." + name). If you change one, change the
other.
Binaries
Thin CLI wrappers around zipnet-node::roles::*. In v1 they still
take a ZIPNET_SHARD flag and scope to ZIPNET_SALT.derive(shard);
this predates the UNIVERSE + instance design and will be retired as
the binaries migrate to Zipnet::bind on UNIVERSE.
| Crate | Flags of note |
|---|---|
zipnet-client | ZIPNET_MESSAGE, ZIPNET_CADENCE |
zipnet-aggregator | ZIPNET_FOLD_DEADLINE |
zipnet-server | ZIPNET_COMMITTEE_SECRET, ZIPNET_MIN_PARTICIPANTS, ZIPNET_ROUND_PERIOD, ZIPNET_ROUND_DEADLINE |
Each binary also takes the common ZIPNET_SHARD, ZIPNET_SECRET,
ZIPNET_BOOTSTRAP, ZIPNET_METRICS — see Environment
variables.
Feature flags
zipnet-node/tee-tdx(off by default) — foldsmosaik::tickets::Tdx::new().require_own_mrtd()?into the committee’s admission validators. Requires mosaik’stdxfeature (on by default) and TDX hardware.zipnet-client/tee-tdx,zipnet-server/tee-tdx— re-export flips of the node crate’s flag.
Mock TEE is the default path (// SIMPLIFICATION: in source); TDX is
opt-in for v1 and the critical-path enforcement lands in v2 (see
Roadmap).
Dependency choices worth knowing
x25519-dalek2.0 pinsrand_core0.6 (not workspace rand 0.9). We break workspace coherence inzipnet-proto/Cargo.tomlby pullingrand_core = "0.6"explicitly forOsRngcompatibility withStaticSecret::random_from_rng. The crate-proper rand dep is workspace-pinned.mosaik = "=0.3.17"— the API we developed against. Upgrades are expected to break compile; thedeclare::stream!/declare::collection!macros are stable-ish, the ticket and group APIs have shifted across minor versions.
Cryptography
audience: contributors
All cryptographic primitives live in zipnet-proto. This chapter is a
rationale + proof-sketch document; correctness tests are in
zipnet-proto::crypto::tests and the end-to-end algebraic test is
zipnet_core::server::tests::e2e_two_servers_three_clients. Nothing on
this page is deployment-topology-specific — the KDF schedule and
falsification-tag construction are identical under any instance layout.
See design-intro for how the instance salt
(and hence schedule_hash, once footprint scheduling lands in v2)
attaches to a deployment.
Primitives
| Purpose | Primitive | Crate |
|---|---|---|
| Key agreement | X25519 | x25519-dalek 2.0 |
| Key derivation | HKDF-SHA256 | hkdf 0.12 |
| Pad generation | AES-128 in CTR mode | aes 0.8 + ctr 0.9 |
| Falsification tag | keyed-blake3 | blake3 1.8 |
| ID derivation | keyed-blake3 | blake3 1.8 |
| Peer-entry signatures | ed25519 | via iroh |
Notable negatives: no signatures from the prototype itself — clients do
not ed25519-sign their envelopes because iroh already signs the
PeerEntry that carries their bundle and the stream transport is
authenticated QUIC. We rely on mosaik’s session security, not on an
application-level signature scheme.
Per-round key schedule
For each (client, server, round) pair the protocol computes a one-time
pad P of length B = num_slots * slot_bytes:
shared = X25519(client_sk, server_pk) // 32 bytes
salt = params_prefix ‖ round ‖ schedule_hash // 56 bytes
prk = HKDF-Extract(salt, shared) // 32 bytes
key = HKDF-Expand(prk, "zipnet/pad/v1", 16) // 16 bytes
iv = round_le ‖ zeros // 16 bytes
P = AES-128-CTR(key, iv, zeros of length B)
where params_prefix is a little-endian encoding of (wire_version, num_slots, slot_bytes, tag_len) and schedule_hash is the 32-byte
NO_SCHEDULE constant in v1 (the footprint scheduling reservation vector
hash in v2).
Why this structure
- Salt over
(params, round, schedule_hash)binds the pad to every negotiated round parameter. A client or server computing with a differentRoundParamsderives a different pad; in the XOR algebra this reduces the colliding result to noise, not to a silent crypto vulnerability. TheWIRE_VERSIONin the salt prefix extends this to major-version boundaries. - HKDF-Extract over the raw DH shared secret, not a hash of it. X25519 shared secrets are uniform in the twist-restricted subgroup; HKDF’s extract step is the standard step to convert that into a uniform PRK.
- AES-128-CTR with a round-prefixed IV. A fresh
IV = (round‖0⁸)gives every round a non-overlapping counter space; the sequence of counters within a round is(round‖0⁸) + 0, 1, 2, .... As long as two rounds never shareround, the AES key–IV pair is never reused. Theround: u64ensures uniqueness across realistic deployments. - HKDF-Expand labelled
"zipnet/pad/v1". The label guards against accidental reuse of the same PRK across crypto contexts; bumping it to"zipnet/pad/v2"is free domain separation. - AES-128 over a stream cipher. AES-NI accelerated; output is pseudorandom; the commutativity that DC nets require (XOR) is immediate.
What this buys
For any honest client C and honest server S that agree on the five inputs
(shared_secret, wire_version, num_slots, slot_bytes, tag_len, round, schedule_hash), they derive byte-identical pads. The XOR operation is
commutative, so the order in which the aggregator and the committee XOR
in their contributions is irrelevant.
For any adversary who does not know shared_secret, the pad is
indistinguishable from uniformly random under the standard DDH assumption
on Curve25519 (for the X25519 step) and the PRF security of AES-128
(for the expansion step), given a secure HKDF.
What this does not buy
- Forward secrecy. A compromise of
shared_secretcompromises every past and future round for that(client, server)pair until the secret is rotated. v2 ratchetsshared_secret ← HKDF-Extract(shared_secret, "ratchet")at each round boundary. - Authentication of the envelope itself. The mosaik transport
authenticates the sender
PeerId(ed25519); the pad binds the envelope toroundandclientvia the KDF inputs. But an adversary who can inject bytes at the transport layer as a specific peer can replay or mutate envelopes. We rely on iroh’s QUIC/TLS.
Falsification tags
The paper’s §3 “falsification tag” is a keyed-blake3 XOF of the plaintext message:
pub fn falsification_tag(message: &[u8], tag_len: usize) -> Vec<u8> {
let key = blake3::derive_key("zipnet:falsification-tag:v1", &[]);
let mut h = blake3::Hasher::new_keyed(&key);
h.update(message);
let mut buf = vec![0u8; tag_len];
h.finalize_xof().fill(&mut buf);
buf
}
Why keyed-blake3, not HMAC
- Keyed-blake3 is a PRF under the standard security argument for blake3-keyed and is enormously faster than HMAC-SHA256 at the sizes involved.
- The key is a domain-separating constant (
"zipnet:falsification-tag:v1") not a secret; the goal is not authentication from an adversary, it’s cross-slot collision resistance.
What the tag protects against
- Malicious client corrupting another honest client’s slot. Slots are deterministically assigned (v1) or reservation-checked (v2). Collisions across clients overwrite both messages with their XOR. An honest client’s tag is computed on its original message; after the XOR with garbage, the tag at the published slot no longer matches the visible payload bytes → any observer rejects the slot as corrupted.
- Malicious client writing garbage in an unused slot. The unused-slot hypothesis fails the tag check; observers skip it.
What the tag does not protect against
- A malicious client corrupting its own slot by writing nonsense and computing a tag over that nonsense. In v1 this is a trivial DoS against the client itself; the protocol treats the published broadcast as authoritative.
- Cross-round correlation attacks based on message length or pattern.
Identity derivation
ClientId = blake3_keyed("zipnet:client:id-v1", dh_pub),
ServerId = blake3_keyed("zipnet:server:id-v1", dh_pub), both XOF’d
to 32 bytes.
Separate domain strings per role prevent an adversary who harvests a
client’s dh_pub from spoofing a server with the same identifier, which
would matter if we ever compared ClientIds and ServerIds inside the
state machine (we don’t, but the separation is free).
Constant-time concerns
- X25519 in
x25519-dalekis constant-time by design. - AES-128-CTR in
aes+ctruses AES-NI on recent x86_64 / ARM — the assembly path is constant-time. - HKDF (SHA-256) is constant-time over inputs of a fixed length.
- XOR buffers are word-wise and constant-time.
- The equality check for tag verification is
Vec::eq— not constant-time. This is fine: tag comparison is against a public broadcast, not against a secret.
If a contributor adds a secret comparison path, they should reach for
subtle::ConstantTimeEq rather than ==.
Cryptographic agility
None. The prototype nails down curve (X25519), hash (blake3, SHA-256),
and cipher (AES-128) because each choice is folded into a string constant
in the KDF. To change any of them, bump WIRE_VERSION and the
corresponding label ("zipnet/pad/v1" → "zipnet/pad/v2").
Rotating the curve to, say, X448 would require a new DhSecret type and
a corresponding ClientBundle / ServerBundle layout change. There is
no on-wire negotiation of crypto parameters — nodes that disagree are
isolated into disjoint groups by construction.
The committee state machine
audience: contributors
Source: crates/zipnet-node/src/committee.rs.
Trait shape
impl StateMachine for CommitteeMachine {
type Command = Command;
type Query = Query;
type QueryResult = QueryResult;
type StateSync = LogReplaySync<Self>;
fn signature(&self) -> UniqueId { ... }
fn apply(&mut self, cmd: Command, ctx: &dyn ApplyContext) { ... }
fn query(&self, q: Query) -> QueryResult { ... }
fn state_sync(&self) -> LogReplaySync<Self> { LogReplaySync::default() }
}
LogReplaySync is the default; the committee state is small (< 1 KB per
round) so replaying the log on catch-up is cheap. When we add per-round
archival in v2 we’ll swap in a snapshot strategy.
Commands
pub enum Command {
OpenRound(LiveRound),
SubmitAggregate(AggregateEnvelope),
SubmitPartial(PartialUnblind),
}
Each command is idempotent:
OpenRound: resetscurrentto a freshInFlight(header). If a previous round was not finalized, its state is silently dropped — the leader is the authority on when to move on.SubmitAggregate: first valid submission wins. Duplicates from follower forwarding are silently ignored. Validation checks:- round matches
current.header.round, - payload length matches
config.params.broadcast_bytes(), - participant set is non-empty,
- every participant is in
current.header.clients(no rogue clients).
- round matches
SubmitPartial: first partial per(round, server)wins. Validation:- round matches,
- partial length matches,
serveris incurrent.header.servers.
When a partial submission brings the total to N_S and an aggregate has
been submitted, apply() calls zipnet_core::server::finalize(...) and
pushes the resulting BroadcastRecord into self.broadcasts. Everything
after that is apply()-synchronous and deterministic.
Queries
pub enum Query {
LiveRound,
CurrentAggregate,
PartialsReceived,
RecentBroadcasts(u32),
}
Queries are read-only and do not replicate. The apply-watcher task on
each server uses weak-consistency queries to drive its side effects
(mirror LiveRound to LiveRoundCell, push broadcasts into the
Broadcasts vec collection, issue partial submissions when an aggregate
appears).
Signature versioning
fn signature(&self) -> UniqueId {
let tag = format!(
"zipnet.committee.v{WIRE_VERSION}.slots={}.bytes={}.min={}",
self.config.params.num_slots,
self.config.params.slot_bytes,
self.config.min_participants,
);
UniqueId::from(tag.as_str())
}
signature() is folded into the GroupId by mosaik, alongside the
GroupKey (derived from INSTANCE.derive("committee")) and the
consensus config. Therefore:
- Bumping
WIRE_VERSION(wire or params breaking change) isolates old nodes from new. - Changing
num_slots,slot_bytes, ormin_participantslikewise forces a fresh group, so nodes can’t silently fork on divergent config. - Changing the instance name (and hence
INSTANCE) disjoins the deployments; twoacme.mainnet/acme.testnetdeployments share noGroupIdeven under identical params. See design-intro — Instance-salt discipline.
If you add a field to CommitteeConfig or change apply semantics
without touching signature(), two nodes with incompatible code will
form the same group and diverge at the apply level. Always bump the
signature string when apply() or Command semantics change. That’s
the invariant.
What this machine guarantees vs. does not
The state machine guarantees round ordering, exactly-once partial admission, and deterministic finalization under Raft’s normal crash- fault tolerance. It deliberately guarantees nothing about anonymity — anonymity is a property of the cryptographic protocol (any-honest-server DC-net algebra, see Threat model), not of consensus. Byzantine committee members cannot break anonymity via the state machine path; they can only withhold or submit bogus partials, which is an availability problem.
Apply-context usage
ApplyContext exposes deterministic metadata. We use it only in a debug
log right now:
debug!(
round = %header.round,
"committee: opening round at index {:?}",
ctx.log_position(),
);
Anything derived from ctx is safe to use in state mutation because
mosaik guarantees it is identical on every replica. If v2 needs a
per-round random salt, pulling it from ctx.log_position() and
ctx.current_term() is the deterministic path.
The apply-watcher
The reason apply() doesn’t write directly to the public collections:
apply() is synchronous and must be free of I/O to keep the state
machine deterministic. Side effects on the outside world go through a
task that polls the group after every commit advance:
tokio::select! {
_ = group.when().committed().advanced() => {
let live = group.query(Query::LiveRound, Weak).await?.into();
let agg = group.query(Query::CurrentAggregate, Weak).await?.into();
let recent = group.query(Query::RecentBroadcasts(8), Weak).await?.into();
reconcile_into_collections(live, agg, recent).await;
maybe_submit_my_partial(agg).await;
}
// ...
}
This is the same pattern the mosaik book recommends for “state machine emits events, side-effect task consumes them”. Because queries are weak-consistency reads of the local replica, they are lock-free and fast; by the time we see the commit advance, the local apply has already run.
Idempotency and replays
- A follower that crashes mid-apply replays the log on recovery. Because
apply()is deterministic, replaying yields the same state. - A client that never sees its round finalized and retries on the next
LiveRoundis safe: the new round has a freshRoundId, new pads, new envelope. No anti-replay logic is needed at the protocol layer. - An aggregator retrying
SubmitAggregateafter a leader flip is safe: the state machine rejects duplicates. - A server retrying
SubmitPartialafter its own restart is safe for the same reason.
Sizes of in-flight state
| Field | Size per round |
|---|---|
LiveRound.clients | N * 32 bytes |
LiveRound.servers | N_S * 32 bytes |
aggregate.aggregate | B bytes (default 16 KiB) |
partials | N_S * (32 + 8 + B) bytes |
Finalization pushes one BroadcastRecord (size: B + N*32 + N_S*32) into
self.broadcasts which is retained in RAM indefinitely in v1. For
long-running deployments you will want external archival; see
Operators — Accounting and audit.
Mosaik integration notes
audience: contributors
Drop-in advice, footguns, and places where the prototype bumped into the mosaik 0.3.17 API. This is a grab-bag — sorted roughly by how likely a contributor is to trip over each item. For the higher-level deployment conventions that sit above mosaik, see design-intro.
Instance-salt derivation
Every public id in a zipnet deployment descends from the instance salt:
use mosaik::{UniqueId, unique_id};
// Compile-time: typos become build errors.
pub const ACME: UniqueId = zipnet::instance_id!("acme.mainnet");
// expands to unique_id!("zipnet.acme.mainnet")
// Runtime: same 32 bytes as the macro for the same name.
let id = zipnet::instance_id("acme.mainnet");
assert_eq!(id, ACME);
// Sub-ids chain with .derive().
let committee_key = ACME.derive("committee"); // GroupKey material
let submit_stream = ACME.derive("submit"); // StreamId
let broadcasts_store = ACME.derive("broadcasts"); // StoreId
The invariant: instance_id(name) and instance_id!("name") must
produce byte-identical outputs. The macro lowers to
unique_id!(concat!("zipnet.", $name)); the runtime fn is
UniqueId::from("zipnet." + name). Change one, change the other.
Never expose raw StreamId / StoreId / GroupId values across the
zipnet crate boundary — Zipnet::bind is the only supported path.
The declare::stream! predicate direction
Reading the macro source (mosaik-macros/src/stream.rs in the mosaik
repo) reveals the following:
“For
requireandrequire_ticket, the side prefix describes who must satisfy the requirement, not who performs the check.consumer require_ticket: Vmeans consumers need a valid ticket, so the producer runs the validator — route to the opposite side.”
So in our ClientToAggregator stream:
declare::stream!(
pub ClientToAggregator = ClientEnvelope,
"zipnet.stream.client-to-aggregator",
producer require: |p| p.tags().contains(&CLIENT_TAG),
consumer require: |p| p.tags().contains(&AGGREGATOR_TAG),
producer online_when: |c| c.minimum_of(1).with_tags("zipnet.aggregator"),
);
producer require: |p| p.tags().contains(&CLIENT_TAG)→ “the producer must have thezipnet.clienttag” → enforced on the consumer side (aggregator subscribes only to peers taggedzipnet.client).consumer require: |p| p.tags().contains(&AGGREGATOR_TAG)→ “the consumer must have thezipnet.aggregatortag” → enforced on the producer side (client accepts subscribers only if they’re taggedzipnet.aggregator).
Getting this inverted produces symptoms like rejected consumer connection: unauthorized in the producer logs, with consumer PeerEntry
tag counts of 1 that don’t match the expected role. The clue is that the
producer is the one rejecting; consumer-requires apply on the producer.
Without both clauses, any peer on the network could subscribe to your
client’s envelope stream — defeating the point. The ticket-based analog
is require_ticket, which is what you want in the TDX-enabled path.
Group<M>, Map<K,V>, Network are not Clone
All three hold Arc internally but don’t derive or implement Clone.
When you need to share them across spawned tasks, wrap in a fresh Arc:
let group = Arc::new(network.groups()...join());
let network = Arc::new(builder.build().await?);
tokio::spawn({
let group = Arc::clone(&group);
async move { ... group.execute(...).await ... }
});
Group::execute, Group::query, Group::feed return futures that are
'static — they take ownership of the arguments they need at the moment
of call, so passing Arc<Group> + Arc::clone() into each task is the
straightforward pattern.
The server role deliberately keeps the Group inside a single
tokio::select! rather than spawning task-per-responsibility so we avoid
the Arc noise. The integration test in zipnet-node/tests/e2e.rs does the
same.
QueryResultAt<M> doesn’t pattern-match directly
group.query(...).await? returns Result<QueryResultAt<M>, QueryError<M>>
where QueryResultAt<M> is #[derive(Deref)] with Target = M::QueryResult.
You cannot pattern-match QueryResultAt against variants of your
QueryResult. The canonical destructure:
let qr = group.query(Query::LiveRound, Consistency::Weak).await?;
let QueryResult::LiveRound(live) = qr.into() else { return Ok(()) };
QueryResultAt::into is inherent (not From) and returns the
M::QueryResult by value.
Cell write / clear
let cell = LiveRoundCell::writer(&network);
cell.set(header).await?; // atomic replace
cell.clear().await?; // empty
There is no unset — the method is clear. Cell already has
Option-like emptiness semantics, so Cell<T> gives you the “sometimes
present” store you’d expect; no need for Cell<Option<T>>.
StateMachine::apply can’t be async
Apply is synchronous by contract. Side effects that need async (e.g. writing to a collection, sending a stream, issuing another command) must happen in a separate task that watches the commit cursor and reads the state machine via queries:
loop {
tokio::select! {
_ = group.when().committed().advanced() => reconcile().await?,
Some(msg) = stream.next() => forward(msg).await?,
_ = period.tick() => maybe_open_round().await?,
}
}
The apply-watcher in zipnet-node/src/roles/server.rs::reconcile_state is
the canonical implementation in our prototype.
InvalidTicket is a unit struct
mosaik::tickets::InvalidTicket doesn’t have ::new; it’s a bare
struct InvalidTicket;. Return it as:
return Err(InvalidTicket);
Context goes into the tracing log, not into the error, because the
error is opaque at the protocol level.
GroupKey::from(Digest)
GroupKey: From<Secret> where Secret = Digest. The ergonomic
constructor from a caller-provided string:
let key = GroupKey::from(mosaik::Digest::from("my-committee-secret"));
GroupKey::from_secret(impl Into<Secret>) is the same thing; either works.
GroupKey::random() is present but not what you want in production
because every committee member must converge on the same value.
Discovery on localhost
iroh’s pkarr/Mainline DHT bootstrap is unreliable for same-box tests.
For integration tests, cross-call sync_with between every pair of
networks (same pattern as mosaik’s examples/orderbook::discover_all):
async fn cross_sync(nets: &[&Arc<Network>]) -> anyhow::Result<()> {
for (i, a) in nets.iter().enumerate() {
for (j, b) in nets.iter().enumerate() {
if i != j {
a.discovery().sync_with(b.local().addr()).await?;
}
}
}
Ok(())
}
For out-of-process binaries, pass an explicit --bootstrap <peer_id>
pointing at a well-known node.
Tag = UniqueId, no tag! macro
Book examples show tag!("...") but 0.3.17 exports no such macro. Tag
is an alias for UniqueId, so use unique_id!("...") for compile-time
construction:
pub const CLIENT_TAG: Tag = unique_id!("zipnet.client");
Runtime construction is Tag::from("...") via the From<&str> impl on
UniqueId.
Declaring collections that don’t exist at use time
The declare::collection! macro refers to its value type by path, so you
can declare a collection over a type defined later in the same crate:
// src/protocol.rs
use crate::committee::LiveRound;
declare::collection!(
pub LiveRoundCell = mosaik::collections::Cell<LiveRound>,
"zipnet.collection.live-round",
);
LiveRound is defined in src/committee.rs; the macro’s expansion
resolves the path at compile time in the usual way.
Network::builder(...).with_mdns_discovery(true)
mDNS is off by default in 0.3.17. For single-box testing and for clusters on the same LAN, turning it on collapses discovery latency from minutes (DHT bootstrap) to sub-seconds. Costs nothing on WAN deployments where it silently no-ops.
Network::builder(network_id)
.with_mdns_discovery(true)
.with_discovery(discovery::Config::builder().with_tags(tags))
.build().await?;
We enable it unconditionally in NetworkBoot::boot.
TDX gating: install own ticket, require others’
Mosaik’s TDX support composes on both sides of the peer-entry dance. The idiomatic zipnet committee setup:
// On boot, if built with the tee-tdx feature:
network.tdx().install_own_ticket()?; // attach our quote to our PeerEntry
// When joining the committee or a public collection, require peers
// to present a matching TDX quote:
use mosaik::tickets::Tdx;
let tdx_validator = Tdx::new().require_mrtd(expected_mrtd);
// Stack with BundleValidator via multi-require_ticket:
group_builder
.require_ticket(BundleValidator::<ServerBundleKind>::new())
.require_ticket(tdx_validator);
expected_mrtd comes from the reproducible committee-image build and
is published alongside the instance name (see
design-intro — A naming convention, not a registry).
In v1, BundleValidator is the only admission check in the non-TDX
path; TDX critical-path enforcement lands in v2
(Roadmap).
Threat model
audience: contributors
This chapter restates the paper’s adversary model (§3.3) against the
concrete objects that exist in our prototype, and gives proof sketches
for the claims we make. The claims are scoped to one zipnet
instance — the committee Group<CommitteeMachine> identified by
INSTANCE.derive("committee") for a given operator-chosen name
(see design-intro — Instance-salt discipline).
Distinct instances on the same universe have disjoint GroupIds,
disjoint rosters, and disjoint anonymity sets; what holds for one
says nothing about another. Multi-instance composition is out of
scope here.
Goals and non-goals
Goal: unlinkability of (author, message) for messages published in
the Broadcasts collection, against any adversary that controls at most
N_S − 1 of N_S committee servers, the aggregator, the TEE host (of
an unbounded subset of clients), and the network. The adversary does
not control a strict majority of the honest clients. (The precise
(t, n)-anonymity formulation is in Appendix A of the paper.)
Non-goals:
- Byzantine fault tolerance of the consensus layer. Mosaik’s Raft variant is crash-fault tolerant, not Byzantine.
- Availability under any adversarial committee participation. In v1, a single crashed committee server halts round progression.
- Confidentiality of application payload. Once finalized,
broadcastis world-readable by design. - Resistance to message-length side channels (see security checklist).
Attacker powers
What the adversary can do:
- Read and modify any packet on the wire. iroh/QUIC authenticates peer identities, so the adversary cannot impersonate an honest node, but can block, delay, or corrupt packets (triggering Raft timeouts and stream reconnects).
- Control the operating system of any non-TEE node, including committee servers it is designated to operate.
- Issue arbitrary
Commands to the committee via a corrupt server (which forwards its own commands into the Raft log) or via a corrupt client (which sends arbitraryClientEnvelopes through the aggregator). - Compromise the TEE of any number of clients (and read their DH secrets) in the v1 mock path.
What the adversary cannot do (by assumption or by protocol):
- Compromise the TEE of a client in the v2 TDX path without triggering attestation failure. (Formal: SGX/TDX bound by the hardware root of trust.)
- Compromise the DH secret of every committee server simultaneously — anonymity requires at least one honest server.
- Force a
BroadcastRecordto contain aparticipantslist that includes an unregisteredClientId: the state machine rejects such an aggregate atSubmitAggregateapply time (see committee state machine).
Anonymity sketch
Let C₁, ..., C_N be the clients participating in round r. Each client
C_i contributes msg_i ⊕ (XOR over servers of pad_ij) to the
aggregate. The aggregate is:
agg_r = XOR_i (msg_i ⊕ XOR_j pad_ij)
= (XOR_i msg_i) ⊕ (XOR_i XOR_j pad_ij)
The broadcast is agg_r ⊕ (XOR_j partial_j) where partial_j = XOR_i pad_ij. Substituting:
broadcast = (XOR_i msg_i) ⊕ (XOR_i XOR_j pad_ij) ⊕ (XOR_j XOR_i pad_ij)
= (XOR_i msg_i) // the inner pads cancel
So the broadcast is exactly the XOR of every client’s slotted message. Given the deterministic slot assignment, messages land in distinct slots (modulo collisions) and can be read back slot-by-slot.
For unlinkability: given any one honest server j* whose pad secrets
are unknown to the adversary, every pad_{ij*} is PRF-indistinguishable
from uniform random (under the PRF security of the HKDF-AES
construction). Each honest client’s envelope_i = msg_i ⊕ XOR_j pad_ij
is therefore PRF-indistinguishable from uniform — the adversary cannot
distinguish which honest client authored which envelope. This is the
standard DC-net anonymity argument under the any-trust assumption.
The paper strengthens this to a (t, n) game (Appendix A). The
state-machine-level permutation check at SubmitAggregate apply ensures
the aggregate’s participants vector is a subset of the round’s client
roster: any participants shuffle by the adversary is a subset of
already-known IDs, so the permutation is within the honest anonymity set.
Integrity: what the state machine guarantees
- A committed
BroadcastRecordis the result of exactly oneSubmitAggregatefollowed by exactly oneSubmitPartialper committee member in that round’sserverssnapshot. No partial is double-counted; no aggregate is re-applied. - Every published
broadcastin the log is computable deterministically from the committed commands. A replay (e.g. after a committee server restart) produces the identical byte sequence.
Integrity: what the state machine does not guarantee
-
The honesty of the aggregator’s fold. A malicious aggregator can:
- omit an envelope (DoS a specific client),
- include a garbage envelope attributed to a real client’s
ClientId(see below), - lie about the
participantslist.
The state machine rejects a
SubmitAggregatewhoseparticipantsset is not a subset of theLiveRound.clientsroster, preventing the aggregator from naming rogue clients. It does not reject an aggregator that names honest clients whose envelopes were never received — but in that case the partial unblinds will remove the expected pads, and the slot of the missing client will show noise (sincemsg_i = 0was not what the honest client sent).A malicious aggregator cannot break anonymity; it can only degrade availability and introduce noise into specific slots.
-
The honesty of a committee server’s partial. A malicious server can submit a garbage
partial. The broadcast will be XORed with that garbage and published as garbage. The state machine has no way to detect this — DC-net unblinding does not carry a zero-knowledge proof. This is consistent with the paper: malicious servers break availability, not anonymity.A v2 mitigation (not in v1) is an anti-disrupter phase modeled on Riposte’s auditing or Blinder’s MPC format check.
Failure modes that break anonymity (not in the adversary model)
- All committee servers collude. By assumption the any-trust model is void; anonymity is lost. Operators must enforce the any-trust diversity axiom out of band.
- The same DH secret is used across roles. Re-using a
DhSecretbetween a committee server and a client (a pathological misconfiguration) would let the server correlate its own client envelopes with its own partial unblinds. TheClientId/ServerIdtype separation guards against this at the type level. - Traffic analysis across rounds. ZIPNet per se does not defend against a global passive adversary who correlates client connection times across many rounds. This is a transport-level concern and is inherited from mosaik’s iroh transport.
- Universe-level co-location. Running on the shared mosaik
universe (Shape B in design-intro) does not
weaken the anonymity argument: admission to the committee group and
to the public write-side streams is gated per-instance by
TicketValidatorcomposition (BundleValidator<K>today,+ Tdx::new().require_mrtd(...)in the TDX path). A peer on the universe who does not present the expected bundle — or MR_TD — is not admitted to the bond, and therefore cannot submit aCommand, a partial, or a client envelope. The universe topology is a discovery-scope decision, not a trust-scope decision.
Denial-of-service surface
| Attacker | Attack | Effect |
|---|---|---|
| Compromised TEE | Flood envelopes | Aggregator backpressures, drops lagging stream senders (mosaik TooSlow code 10_413) |
| Compromised aggregator | Omit / delay aggregates | Rounds stall until the committee’s round_deadline fires |
| Compromised committee server | Omit partial | Round never finalizes; operator intervenes or the server is rotated out |
| Compromised committee server | Submit malformed partial | Broadcast is garbage for this round; next round is clean |
| Network | Drop / delay packets | Raft heartbeats time out, election thrashes, rounds delayed |
All of these are availability issues and none of them break anonymity of past or future rounds.
Roadmap to v2
audience: contributors
These are the simplifications baked into v1 and the planned path to address each. The order here is not the implementation order — it is the order in which each change affects the external behavior of the system.
Footprint scheduling
v1: deterministic slot per (client, round) via keyed-blake3 mod
num_slots. Collision probability ≈ N / num_slots; at N = 8, num_slots = 64 that’s ~12%.
v2: the paper’s two-channel scheduling (§3.2). A side channel of
4 * N slots holds footprint reservations. Clients pick a random slot
and an f-bit random footprint each round, write the footprint into
the scheduling vector, and in round r+1 use the assigned message slot
only if their footprint round-tripped unchanged.
Implementation shape: add a second RoundParams::num_sched_slots and
a second broadcast vector, run the same HKDF-AES pad derivation against
a distinct label "zipnet/pad/sched/v1". The CommitteeMachine
consumes two aggregates per round (message + schedule) and splits the
final broadcast into two halves. WIRE_VERSION bump: 1 → 2.
Cover traffic
v1: non-talking clients omit their envelope entirely. This narrows the anonymity set to active talkers.
v2: clients with no message produce a pure-pad envelope (msg_i = 0,
all pads XORed in). The aggregator and committee process these
indistinguishably from talker envelopes. The only visible change at the
state-machine level: participants grows to include cover traffic.
This is a tiny code change on the client (just remove the
“skip when message == None” early return in client::seal) plus a
policy decision on how often a client should send cover. Stay-cheap-
on-the-server was a first-class design goal of the paper; v2 makes it
concrete.
Ratcheting for forward secrecy
v1: every round reruns HKDF-Extract from the same shared_secret.
Compromise of the secret compromises all past pads.
v2: at the end of each round, both client and server ratchet:
shared_secret ← HKDF-Extract("zipnet/ratchet/v1", shared_secret);
Past shared secrets are unrecoverable from the new one under the PRF
assumption. Both sides must step the ratchet in lockstep; the round
number acts as the step counter. Committee members rederiving a missed
step for a late-joining client catch up by evaluating the KDF round
times.
For the client, the ratchet state sits in the TEE’s sealed storage (v2 TDX path). For the mock client, it sits in RAM — so a restart re-derives an independent key tree, which is fine.
Multi-tier aggregators
v1: single aggregator.
v2: arbitrary rooted tree of aggregators. Each leaf-level aggregator
XOR-folds from its assigned clients, pushes up to its parent, parent
folds and pushes to root, root publishes to the committee. Filtering
uses require(|p| peer.tags().contains(&tag!("aggregator.tierN"))) and
with_tags("aggregator.tierN+1") on online_when.
Each aggregator-to-aggregator link uses a dedicated stream (we already
have the pattern in AggregateToServers). No state-machine change
required because the root aggregator still emits one AggregateEnvelope
per round.
Liveness resilience
v1: any committee server being offline halts round finalization —
the state machine waits for len(partials) == len(header.servers).
v2 options:
-
Relaxed finalization. Finalize after
t-of-npartials, wheretis a configured threshold. A missing server’s pads are retroactively removed via a published “apology partial” submitted by any honest server that knows the remaining clients’ pads. (This requires publishing the missing server’s pad seeds under the committee’s shared secret, which defeats the point — so it needs MPC.) -
Aggregator-sponsored timeout. The leader signals a timeout, bumps the
RoundId, and opens a fresh round without the stuck server’s pads. This is simpler but loses the anonymity contribution of the absent honest server.
The first option is research-complete but not engineering-complete; the second option is trivial and is the candidate for v2.
TDX attestation in the critical path
v1: tee-tdx feature exists but the committee accepts any peer
with a well-formed ClientBundle ticket (our BundleValidator only
checks id/dh_pub consistency).
v2: on each committee admission path add .require_ticket(Tdx::new() .require_mrtd(expected_mrtd)) so only enclave-verified peers can
participate. The expected MR_TD comes from the reproducible image build.
ClientRegistry writes only land if the bundle’s PeerEntry also
carries a valid TDX quote.
This is additive to the existing BundleValidator and stacks cleanly
thanks to mosaik’s multi-require_ticket support.
State archival and snapshot sync
v1: CommitteeMachine.broadcasts grows unbounded in RAM;
LogReplaySync is used for catch-up.
v2: implement a StateSync strategy that snapshots the last N
broadcasts + the current InFlight and emits a blob. Externalize the
archival of rotated broadcasts to a sink collection or a replicated
object store.
Rate-limiting tags
v1: absent. A malicious client can flood envelopes.
v2: per the paper’s §3.1 sketch, each envelope carries
PRF_k(ctr || epoch) where ctr is attested by the enclave. The
aggregator dedupes by tag per epoch. This requires the TEE path to have
landed first.
Scheduling vector equivocation protection
v1: a single leader publishes LiveRound into LiveRoundCell;
divergent schedules would be detectable via the schedule_hash input
to the KDF (if we included it — we pass NO_SCHEDULE in v1). Once
footprint scheduling lands, every client must derive schedule_hash
from the same broadcast schedule as the committee, or pads disagree and
the broadcast is noise (correct failure mode per paper §3.2).
Versioning under stable instance names
v1: every incompatible change (any WIRE_VERSION or
signature() bump) produces a new GroupId. Under the UNIVERSE +
instance-salt design described in
design-intro,
this effectively makes the old instance a ghost and forces consumers
to re-pin. If "acme.mainnet" is meant to be an operator-level
identity that outlives schema changes, v1 cannot deliver it.
v2 must pick one of two reconciliation strategies, documented in design-intro — Versioning under stable instance names:
- Version-in-name.
acme.mainnet-v2retiresacme.mainnet. Clean, but forces a consumer-side release per bump. - Lockstep releases. The instance name stays stable across versions and operators + consumers cut matching releases against a shared deployment crate. Avoids id churn at the cost of tighter release-cadence coupling.
Neither is chosen yet. The call is forced the first time a v2 milestone above lands in a production deployment.
Cross-service composition
v1: zipnet is the only service we ship on zipnet::UNIVERSE.
v2: as sibling services (multisig signer, secure storage, attested oracles) land on the same universe, two concerns surface:
- Catalog noise. Every peer on the universe appears in every
agent’s discovery catalog.
/mosaik/announcevolume scales with the universe, not with the services an agent cares about. The escape hatch is the per-service derived private network for high-churn internal chatter; the residual cost is paid by everyone. If a service’s traffic would dominate the shared network, it belongs behind its ownNetworkId— Shape A in design-intro — Two axes of choice — not on the shared one. - Cross-service atomicity. “Mix a zipnet message AND rotate a
multisig signer” cannot be a single consensus transaction; they are
different
Groups, possibly with disjoint membership. If a coordination-heavy use case genuinely needs that, the answer is a fourth primitive that is itself a deployment providing atomic composition — not an ad-hoc cross-group protocol.
Optional directory collection (devops convenience)
Not a core feature. Zipnet’s consumer binding path is compile-
time name reference plus mosaik peer discovery; no on-network
registry is required, and the
CLAUDE.md commitment is explicit that one will
not be added. However, a shared Map<InstanceName, InstanceCard>
listing known deployments may ship as a devops convenience for
humans enumerating instances across operators. If built, it must:
- be documented as a convenience, not a binding path;
- be independently bindable — the SDK never consults it;
- not become load-bearing for ACL or attestation decisions.
Flag-in-source as // CONVENIENCE: if it lands, to distinguish it
from the // SIMPLIFICATION: v2-deferred markers.
Migration across these milestones
Each milestone above changes WIRE_VERSION or at minimum
CommitteeMachine::signature(). Rolling between v1 and an arbitrary
v2 milestone is therefore a coordinated “stop all nodes, start with new
config” operation — same procedure as
rotating the committee secret.
We make no attempt at on-the-fly upgrade paths in this prototype.
Extending zipnet
audience: contributors
This chapter covers two kinds of extension:
- Extending zipnet itself — new commands, collections, streams, ticket classes, or round-parameter knobs within a zipnet deployment.
- Building an adjacent service on the shared universe — a new
mosaik-native service (multisig signer, secure storage, attested
oracle, …) that coexists with zipnet on
zipnet::UNIVERSEand reuses the instance-salt pattern.
The second is the generalisation of the first. The “checklist for a new service” at the end of design-intro is the canonical reference for the second kind; this chapter links to it and concentrates on the concrete how-tos.
Extending zipnet itself
Adding a new command to the committee state machine
- Add a variant to
Commandincrates/zipnet-node/src/committee.rs. - Handle it in
apply(). Deterministic only — no I/O, no randomness that isn’t derived fromApplyContext(see Committee state machine — Apply-context usage). - Bump the version tag in
CommitteeMachine::signature()(v1→v2). This re-scopes theGroupIdso mismatched nodes cannot bond. This is a breaking change. - Add a
Queryvariant if the new state needs external read access. - Decide who issues the command. If a non-server peer needs to
trigger it, add a
declare::stream!channel and a side-task inroles::serverthat feeds it intogroup.execute.
Adding a new collection
-
Declare in
crates/zipnet-node/src/protocol.rs:declare::collection!( pub MyMap = mosaik::collections::Map<K, V>, "zipnet.collection.my-map", ); -
Decide writer and reader roles. Writers join the collection’s internal Raft group and bear the leadership election cost.
-
For TDX-gated collections, compose
Tdx::new().require_mrtd(...)onto the collection’srequire_ticketalongside the existingBundleValidator— see Mosaik integration — TDX gating. -
If the new collection is part of the public surface, think twice. Zipnet’s declared public surface is small (write-side + read-side, see Architecture). A new public collection widens the consumer contract; prefer surfacing via
Zipnet::bindinstead of growing raw declarations. -
Once the target per-instance layout lands, the literal string will be replaced by
INSTANCE.derive("my-map"); structure the name so the migration is a pure rename.
Adding a new typed stream
- Declare in
protocol.rs. Prefix predicates withproducer/consumerper the direction semantics (Mosaik integration — predicate direction). - Use in a role module:
MyStream::producer(&network)/MyStream::consumer(&network)returns concrete typed handles. - If this is a high-churn internal channel (aggregator fan-in, DH gossip), it’s a candidate to live on a derived private network rather than the shared universe — see Architecture — Internal plumbing.
Adding a new TicketValidator
-
Implement
mosaik::tickets::TicketValidatoron a fresh type.BundleValidator<K>incrates/zipnet-node/src/tickets.rsis the reference shape. -
Pick a
TicketClassconstant. Keep it human-readable ("zipnet.bundle.server", etc.) — ticket classes are intent-addressed and the string is the intent. -
Fold a version tag into
signature()the same wayBundleValidatordoes:fn signature(&self) -> UniqueId { K::CLASS.derive("zipnet.my-validator.v1") }Bumping
v1→v2re-scopes theGroupIdof every group that stacks this validator. Treat it as a breaking change. -
Compose with existing validators via mosaik’s multi-
require_ticket— see Mosaik integration — TDX gating for the stacking pattern.
Changing RoundParams
- Edit
RoundParams::default_v1()incrates/zipnet-proto/src/params.rs. - Bump
WIRE_VERSIONif the change is semantically meaningful (any client/server disagreement on shape would garble pads otherwise). CommitteeMachine::signature()already mixes in params fields; every member rederivesGroupIdand old + new do not bond.- Deploy-time coordination: same procedure as rotating the committee secret.
Adding a TDX attestation requirement
-
Turn on the
tee-tdxfeature onzipnet-node,zipnet-server,zipnet-client. -
In the deployment-specific
main, pre-compute (or hardcode) the expected MR_TD. -
Build a validator:
use mosaik::tickets::Tdx; let validator = Tdx::new().require_mrtd(expected_mrtd); -
Plumb
validatorinto the server’srunpath by stacking it on the committeeGroupBuilder::require_ticketand on each collection / stream whose producer you want to TDX-gate.
Swapping the slot assignment function
- The slot is picked by
zipnet_core::slot::slot_for(client, round, params). Change the body; the caller contract is-> usize. - If you want the footprint scheduling variant, you’ll also want a per-round side channel — see Roadmap — Footprint scheduling.
- Deterministic and agreed upon by all nodes. Bump the protocol version tags accordingly.
Running the integration test under heavier parameters
crates/zipnet-node/tests/e2e.rs uses RoundParams::default_v1()
and a hardcoded 3-server / 2-client topology. Modify directly; the
helpers (cross_sync, run_server, run_client, run_aggregator)
are scoped to the test so no cross-cutting refactor is needed.
RUST_LOG=info,zipnet_node=debug cargo test -p zipnet-node --test e2e -- --nocapture
A successful run ends with
zipnet e2e: round r1 finalized with 2/2 messages recovered
Where to put a new role
If you introduce a fourth participant type (say, an “auditor” that
archives Broadcasts to cold storage), the idiomatic placement is a
new module in crates/zipnet-node/src/roles/ and a sibling crate
under crates/zipnet-auditor/ that delegates to it. Follow the
zipnet-aggregator binary layout.
Measuring something
Mosaik’s Prometheus metrics are auto-wired; add your own via the
metrics crate:
use metrics::{counter, gauge};
counter!("zipnet_rounds_opened_total").increment(1);
gauge!("zipnet_client_registry_size").set(registry.len() as f64);
They will appear at the configured ZIPNET_METRICS endpoint without
any scraper-side changes.
Building an adjacent service on the shared universe
Zipnet’s deployment model is a reusable pattern — the full rationale
is in design-intro. Any service that wants to
coexist on zipnet::UNIVERSE alongside zipnet should reproduce the
three conventions:
- Instance-salt discipline. Every public id descends from
blake3("yourservice." + instance_name). Provide both a compile-time macro and a runtime fn that produce byte-identical output. - A
Deployment-shaped convention. Declare the public surface (one or two primitives, ideally) in a single protocol module; export abind(&Network, instance_name) -> TypedHandlesfunction. - A naming convention, not a registry. Operator → consumer
handshake is universe
NetworkId+ instance name + (if TDX-gated) MR_TD. No on-network advertisement required — mosaik’s standard discovery bonds the sides.
Walk the
checklist for a new service
end-to-end before writing any code. The most common mistake is not
answering “what happens when StateMachine::signature() bumps?”
before shipping.
When Shape B is the wrong call
A service whose traffic would dominate catalog gossip on the shared
universe (high-frequency metric streams, bulk replication) belongs
behind its own NetworkId — Shape A in
design-intro — Two axes of choice.
The narrow-public-surface discipline does not rescue a service
whose steady-state traffic is inherently loud; at that point the
noise cost dominates the composition benefit.
Optional directory collection
If your operator community wants a human-browsable list of known
deployments, ship a sibling Map<InstanceName, InstanceCard> as a
devops convenience, not as part of the consumer binding path. See
Roadmap — Optional directory collection
for the discipline.
Glossary
audience: all
Domain terms as they are used in this book and in the source.
Aggregator. The untrusted node that XOR-folds client envelopes for
a round into a single AggregateEnvelope and forwards it to the
committee. One aggregator in v1; a tree of aggregators in v2. Runs
inside zipnet-aggregator.
Any-trust. Security assumption where anonymity holds as long as at least one party in a designated set is honest. The zipnet committee is an any-trust set.
bind. The Zipnet::bind(&Arc<Network>, &str) constructor —
the single public path from a mosaik network handle to a typed zipnet
handle. Takes an instance name; derives every instance-local ID
internally; returns a Zipnet that exposes publish / subscribe /
shutdown. See Quickstart — publish and read.
bind_by_id. The Zipnet::bind_by_id(&Arc<Network>, UniqueId)
variant of bind, for consumers who have pre-derived the instance
UniqueId at compile time via zipnet::instance_id!("name"). The
macro and runtime instance_id fn produce identical bytes, so a
compile-time bind_by_id and a string bind with the same name land
on the same instance.
Bond. mosaik term for a persistent QUIC connection between two
members of the same Raft group, authenticated by the shared
GroupKey.
Broadcast vector. B = num_slots * slot_bytes bytes of output
per round. Default 16 KiB. Each finalized round commits one broadcast
vector to the Broadcasts collection.
Client. A node that authors messages and seals them into envelopes inside a TEE. In the mock path (v1 default), the TEE is replaced by a plain process; see Security checklist.
ClientBundle. Public pair (ClientId, dh_pub) gossiped via a
discovery ticket so servers can derive per-client pads.
ClientId. 32-byte blake3-keyed hash of the client’s X25519
public key. Stable as long as the client’s DH secret is stable.
Committee. The set of any-trust servers that collectively unblind
the round’s aggregate. In v1 this is a Raft group with a bespoke
CommitteeMachine state machine. One committee per instance.
Cover traffic. Client envelopes carrying a zero message, sent to widen the anonymity set at negligible extra cost. The SDK sends cover envelopes by default when an instance is bound but idle. See Publishing messages.
DC net. Dining Cryptographers network — the XOR-based anonymous broadcast construction zipnet descends from. See Chaum 1988.
DH secret. An X25519 static secret held by a client or a server. Compromise of one party’s DH secret only affects that party; compromise of every committee server’s DH secret breaks anonymity.
Encrypted mempool. The canonical motivating deployment shape: TEE-attested wallets seal transactions and publish them through zipnet; builders read the ordered log of sealed transactions; no single party can link a transaction back to its sender. Zipnet supplies the anonymous publish channel; the encryption of the payload itself (threshold, TEE-unsealing, etc.) sits on top.
Envelope. A client’s per-round contribution: a broadcast-vector-
sized buffer containing message ‖ tag at the client’s slot and zeros
elsewhere, XORed with the sum of the client’s per-server pads.
Falsification tag. A keyed-blake3 output of the plaintext message, written alongside the message in the same slot. Verifies that a slot’s payload is intact (§3, “ROMHash” in the paper).
Fold. The aggregator’s XOR combine of all envelopes for a round.
Footprint scheduling. The paper’s two-channel slot reservation scheme (§3.2). v2 feature.
GroupId. mosaik’s 32-byte identifier for a Raft group, derived
from the GroupKey, consensus config, state machine signature, and
any TicketValidator signatures. Fully determined by the instance
name plus the deployment crate version.
GroupKey. Shared committee secret. Admission gate for joining
the committee’s Raft group.
Instance. A single zipnet deployment — one committee, one ACL, one set of round parameters — sharing a universe with other zipnet instances and other mosaik services. Operators stand up and retire instances; users bind to them by name.
Instance name. A short, stable, namespaced string that
identifies an instance within a universe (e.g. acme.mainnet,
preview.alpha, dev.ops). Folds deterministically into every
instance-local ID. Flat namespace per universe — collisions are
silent, so namespace defensively (<org>.<purpose>.<env>).
instance_id. Runtime function and macro on the zipnet facade
that derive an instance’s root UniqueId from its name.
zipnet::instance_id("acme.mainnet") and
zipnet::instance_id!("acme.mainnet") produce identical bytes —
both expand to blake3("zipnet.acme.mainnet"). Sub-IDs chain off it
via .derive("submit" | "broadcasts" | "committee" | …).
LiveRound. The currently-open round’s header: round id, client
roster snapshot, server roster snapshot.
mosaik. The Flashbots library on which this prototype is built. Provides discovery, typed streams, consensus groups, and replicated collections. See docs.mosaik.world.
MR_TD. 48-byte Intel TDX guest measurement. Published by the
operator out of band; pinned by clients; enforced by the mosaik
Tdx bonding layer. See
TEE-gated deployments.
Pad. The output of the KDF for a given (client, server, round)
triple; length B. XOR of pads is the DC-net’s one-time key.
Partial unblind. One committee server’s XOR of its per-client pads over the round’s participant set. XORing all partials into the aggregate yields the broadcast.
PeerId. mosaik identifier for a node: its ed25519 public key
(via iroh). Different from ClientId / ServerId (which are
DH-key-based).
Raft. The consensus protocol used by the committee group. mosaik uses a modified Raft with abstention votes.
Ratchet. Stepping the shared secret forward one round;
shared_secret ← HKDF(shared_secret). Provides forward secrecy. v2
feature.
Round. One execution of the protocol:
OpenRound → SubmitAggregate → N_S × SubmitPartial → finalize.
RoundId. Monotonically increasing integer; r0, r1, ....
RoundParams. Static shape of a round: num_slots, slot_bytes,
tag_len, wire_version. Immutable for the lifetime of an instance.
ServerBundle. Public pair (ServerId, dh_pub) gossiped via a
discovery ticket so clients can derive per-server pads.
ServerId. 32-byte blake3-keyed hash of a committee server’s
X25519 public key.
Slot. One slot_bytes-byte region of the broadcast vector. One
active client per slot per round (modulo deterministic collisions).
State machine signature. UniqueId mixed into GroupId
derivation. Bumped whenever apply semantics or Command shape
changes.
TEE. Trusted Execution Environment. Intel TDX in the production path; mock in the v1 default path.
TDX. Intel Trust Domain Extensions — the TEE zipnet targets.
Guest measurement is MR_TD. See
TEE-gated deployments.
Ticket. Opaque bytes attached to a signed PeerEntry in mosaik
discovery. Zipnet uses tickets of classes zipnet.bundle.client and
zipnet.bundle.server to distribute DH pubkeys, and relies on
mosaik’s require_ticket for per-instance ACL on the public
primitives.
Universe. The shared mosaik NetworkId on which zipnet (and any
other mosaik service) runs. The zipnet facade exports the constant
zipnet::UNIVERSE = unique_id!("mosaik.universe"). Many instances,
and many unrelated services, coexist on one universe.
XOR. Exclusive-or over equal-length byte buffers. The DC-net’s fundamental operation.
Paper cross-reference
audience: contributors
Pointer table from the prototype’s source modules to the ZIPNet paper (eprint 2024/1227). Section / algorithm / figure numbers are from the camera-ready version. Crate paths are workspace-relative.
| Paper item | Prototype location |
|---|---|
| §2.1 “Chaum’s DC net” (background) | zipnet-proto::xor (crates/zipnet-proto/src/xor.rs) |
| §2.2 “ZIPNet overview” (Figure 1b) | crates/zipnet-node/src/lib.rs diagram + Architecture |
| §3 “Falsifiable TEE assumption” | zipnet-proto::crypto::falsification_tag (crates/zipnet-proto/src/crypto.rs) |
| §3 “Setup” (PKI, attestation, sealed key DB) | zipnet-proto::keys + zipnet-node::tickets::BundleValidator (crates/zipnet-node/src/tickets.rs) |
| §3 “Sealed data” | v2 sealed storage in TEE; not implemented in v1 |
| §3.1 “Rate limiting tags” | v2 item; not implemented |
| §3.2 “Scheduling” (footprint) | v2 item; not implemented (see roadmap) |
| §3.3 “Adversary and network model” | Threat model |
| §3.3 “Security argument” | Threat model — anonymity sketch |
| Algorithm 1 (client seal) | zipnet-core::client::seal (crates/zipnet-core/src/client.rs) |
| Algorithm 2 (aggregator fold) | zipnet-core::aggregator::RoundFold (crates/zipnet-core/src/aggregator.rs) |
| Algorithm 3 (server partial + finalize) | zipnet-core::server::partial_unblind + zipnet-core::server::finalize (crates/zipnet-core/src/server.rs) |
| Appendix A (anonymous broadcast definition) | inherited — the prototype does not reprove it |
Crate responsibilities
The workspace splits the paper’s constructions along a purity boundary (see Crate map):
| Crate | Paper content | I/O? |
|---|---|---|
zipnet-proto | Wire types, keys, XOR, falsification tag primitive | No |
zipnet-core | Algorithms 1 / 2 / 3 as pure functions over zipnet-proto types | No |
zipnet-node | The mosaik integration — CommitteeMachine, role event loops, TicketValidator | Yes |
zipnet-server / zipnet-aggregator / zipnet-client | Thin CLI wrappers around zipnet-node::roles::{server, aggregator, client} | Yes |
zipnet | SDK facade (Zipnet::bind, UNIVERSE, instance_id!); wraps zipnet-node for external consumers | Yes |
zipnet-proto and zipnet-core do not import mosaik or tokio;
if a paper construction reaches for either, it is in the wrong crate.
Notation
The paper uses capital N (total users), N_S (servers), |m|
(slot bytes), B (broadcast vector bytes). The prototype uses
lowercase n / num_slots / slot_bytes / broadcast_bytes in
code and generally follows the paper’s naming in comments.
Deliberate deviations from the paper
- No schedule hash in v1. The paper mixes
publishedScheduleinto the KDF salt. The prototype passes a constantNO_SCHEDULE = [0u8; 32]in v1 and will replace it with the real schedule hash when footprint scheduling lands. Binding the schedule into the KDF is already plumbed (crypto::kdf_salttakes it as an argument), so the upgrade is a caller-site change. - Tag is keyed-blake3, not HMAC. The paper writes “ROMHash” informally; the prototype picks keyed-blake3 with a fixed domain-separating label for performance. Both are PRFs under standard assumptions; no security difference relative to the paper’s ROM-based argument.
- No traitor tracing protocol. The paper’s §3 suggests that any malformed message flips hash bits and is detected with overwhelming probability. v1 only checks tags on observation; an adversarial client writing to an unused slot is visible via tag mismatch but not attributed. This matches the paper’s “falsifiable trust assumption” but does not implement the §3.1 rate-limiting PRF tags.
- Anonymous broadcast channel for scheduling. The paper runs a second DC net for reservations (§3.2). v1 runs only the message channel.
- Instance namespacing replaces paper-implicit single-deployment
identity. The paper treats a ZIPNet committee as a single global
entity. The prototype runs many instances side by side on a
shared mosaik universe, each with its own salt (see
Designing coexisting systems on mosaik).
No paper construction is changed by this; every derivation folds
instance_idin where the paper has an implicit single “deployment” constant.
Environment variables
audience: both
Variables that every binary respects, plus role-specific ones. All
are optional unless marked Required. Values are passed either as
an env var or as the corresponding CLI flag; env beats flag when both
are set (per clap(env = "...")).
Users do not read this page — the SDK takes no env vars. This is an operator reference. When it diverges from what a binary currently parses, the binary is lagging the documented deployment model; align the binary to this page, not the other way around.
Common to every binary
| Variable | CLI flag | Default | Description |
|---|---|---|---|
ZIPNET_INSTANCE | --instance | Required | Instance name for this deployment (e.g. acme.mainnet). Folds into committee GroupId, submit StreamId, broadcasts StoreId. All processes of one deployment must share this value. |
ZIPNET_UNIVERSE | --universe | zipnet::UNIVERSE (mosaik.universe) | Override the shared mosaik universe NetworkId. Set only for isolated federations; leave unset for normal deployments. |
ZIPNET_BOOTSTRAP | --bootstrap | (none) | Comma- or repeat-flag-separated PeerIds on the shared universe to dial on startup. Universe-level, not per-instance. |
ZIPNET_METRICS | --metrics | (none) | Prometheus exporter bind address, e.g. 0.0.0.0:9100. |
ZIPNET_SECRET | --secret | (random) | Seed for this node’s iroh secret. Anything not 64-hex is blake3-hashed. Recommended on committee servers and the aggregator for stable PeerId. |
RUST_LOG | — | info,zipnet_node=debug | Standard tracing_subscriber filter. |
zipnet-server
| Variable | CLI flag | Default | Description |
|---|---|---|---|
ZIPNET_COMMITTEE_SECRET | --committee-secret | Required | Shared committee admission secret. Treated as a root credential — all committee servers of the same instance must share this value; clients and the aggregator must not have it. |
ZIPNET_MIN_PARTICIPANTS | --min-participants | 1 | Minimum registered clients before the leader opens a round. |
ZIPNET_ROUND_PERIOD | --round-period | 2s | How often the leader attempts to open a new round. |
ZIPNET_ROUND_DEADLINE | --round-deadline | 6s | How long a round may stay open before the leader force-advances. |
zipnet-aggregator
| Variable | CLI flag | Default | Description |
|---|---|---|---|
ZIPNET_FOLD_DEADLINE | --fold-deadline | 2s | Time window after a round opens in which the aggregator accepts envelopes. |
zipnet-client
| Variable | CLI flag | Default | Description |
|---|---|---|---|
ZIPNET_MESSAGE | --message | (none) | UTF-8 message to seal each round. Omit to run as cover traffic. |
ZIPNET_CADENCE | --cadence | 1 | Talk every Nth round (1 = every round). |
Duration syntax
The duration parsers accept Nms, Ns, Nm (e.g. 500ms, 2s,
1m). Hours / days are not supported; if you need them, file an
issue.
Secret syntax
All “secret” style inputs (ZIPNET_SECRET, ZIPNET_COMMITTEE_SECRET)
follow the same rule:
- Exactly 64 hex characters → decoded as 32 raw bytes.
- Anything else → blake3-hashed into 32 bytes.
This matches mosaik’s own secret-key handling, so operators can reuse
whatever seed format they already have (e.g. openssl rand -hex 32).
Instance derivation
Every instance-local ID is derived from ZIPNET_INSTANCE:
INSTANCE = blake3("zipnet." + ZIPNET_INSTANCE) // UniqueId
SUBMIT = INSTANCE.derive("submit") // StreamId
BROADCASTS = INSTANCE.derive("broadcasts") // StoreId
COMMITTEE = INSTANCE.derive("committee") // GroupKey material
...
The consumer-side zipnet::instance_id!("name") macro produces the
same bytes as the server-side ZIPNET_INSTANCE=name derivation, so
a typo on either side lands on a GroupId nobody serves. The
failure mode is Error::ConnectTimeout on the client, not a
distinct “not found” error — zipnet has no on-network registry.
Two deployments with different ZIPNET_INSTANCE values on the same
universe are completely independent committees: disjoint
GroupIds, disjoint streams, no crosstalk. Useful for:
- running dev/staging/prod in one machine pool,
- running per-tenant deployments on shared hardware,
- running a public testnet (
preview.alpha) alongside production (mainnet).
Instance names share a flat namespace per universe — two operators
picking the same name collide in the committee group and neither
works correctly. Namespace defensively
(<org>.<purpose>.<env>, e.g. acme.mixer.mainnet).
Universe override (ZIPNET_UNIVERSE)
Default is the shared mosaik universe (zipnet::UNIVERSE = unique_id!("mosaik.universe")). Override only when running an
isolated federation that intentionally does not share peers with the
rest of the mosaik ecosystem. Every server, aggregator, and client
of one deployment must agree on this value; consumers of the SDK
build against zipnet::UNIVERSE unless their code explicitly passes
a different NetworkId to Network::new.
Metrics reference
audience: operators
Every zipnet binary exposes a Prometheus endpoint when
ZIPNET_METRICS is set. The table below lists the metrics worth
scraping in production. Metrics starting with mosaik_ are emitted
by the underlying mosaik library and documented in the
mosaik book — Metrics;
the ones that are load-bearing for zipnet operations are listed here.
Metrics that are instance-scoped carry an instance label whose value
is the operator’s ZIPNET_INSTANCE string (e.g. acme.mainnet).
When a host multiplexes several instances (see
Operator quickstart — running many instances),
every instance-scoped metric is emitted once per instance.
Per-role metrics
Committee server
| Metric | Kind | Meaning | Healthy value |
|---|---|---|---|
mosaik_groups_leader_is_local{instance=<name>} | gauge (0/1) | Whether this node is the Raft leader for the instance | Exactly one 1 across the committee of each instance |
mosaik_groups_bonds{peer=<id>,instance=<name>} | gauge (0/1) | Whether a bond to a specific peer is healthy | 1 for every other committee member of the same instance |
mosaik_groups_committed_index{instance=<name>} | gauge | Highest committed Raft index | Monotonically increasing, step ≈ 2 per round |
zipnet_rounds_finalized_total{instance=<name>} | counter | Rounds this node saw finalize | Increases at ~1 / ZIPNET_ROUND_PERIOD |
zipnet_partials_submitted_total{instance=<name>} | counter | Partials this node contributed | Increases 1-per-round |
zipnet_client_registry_size{instance=<name>} | gauge | Clients currently registered | Roughly = expected client count |
zipnet_server_registry_size{instance=<name>} | gauge | Servers currently registered | Equals committee size |
The mosaik_groups_leader_is_local gauge is the one the operator
quickstart tells you to check when bringing a new instance up —
exactly one committee node should report 1 per instance.
Aggregator
| Metric | Kind | Meaning | Healthy value |
|---|---|---|---|
mosaik_streams_consumer_subscribed_producers{stream=<id>,instance=<name>} | gauge | Number of producers this consumer is attached to | = client count for ClientToAggregator |
mosaik_streams_producer_subscribed_consumers{stream=<id>,instance=<name>} | gauge | Number of consumers attached to this producer | = committee size for AggregateToServers |
zipnet_aggregates_forwarded_total{instance=<name>} | counter | Aggregates sent to the committee | ≈ rounds finalized |
zipnet_fold_participants{round=<r>,instance=<name>} | histogram | Clients per folded round | Depends on your client count |
zipnet_clients_registered_total{instance=<name>} | counter | Client bundles mirrored into ClientRegistry | Grows to client count, then plateaus |
Client
| Metric | Kind | Meaning | Healthy value |
|---|---|---|---|
zipnet_envelopes_sent_total{instance=<name>} | counter | Envelopes sealed and pushed | Increases by 1 per talk round |
zipnet_envelope_send_errors_total{instance=<name>} | counter | send failures | Ideally 0 |
zipnet_client_registered{instance=<name>} | gauge (0/1) | Whether our bundle is in ClientRegistry | 1 after the first few seconds |
Metrics that indicate trouble
| Metric | Fires when | First action |
|---|---|---|
mosaik_groups_leader_is_local is 1 on zero or ≥ 2 nodes of one instance for > 1 min | Split-brain or no leader | Incident response — split-brain |
mosaik_streams_consumer_subscribed_producers drops to 0 on the aggregator | Clients disconnected | Check client-side logs for bootstrap failures |
zipnet_aggregates_forwarded_total flat for > 3 × ZIPNET_ROUND_PERIOD | Aggregator stuck OR committee cannot open rounds | Incident response — stuck rounds |
zipnet_server_registry_size < committee_size for > 30 s | A committee server failed to publish | Check that server’s boot log |
mosaik_groups_committed_index frozen | Raft stalled | Check clock skew, network partition |
Every trouble alert should be scoped by instance so multi-instance
hosts do not conflate a stuck testnet with a stuck production
committee.
Recording rules for Prometheus
Useful derived series (all scoped by instance):
# Round cadence per instance
rate(zipnet_rounds_finalized_total[5m])
# Average participants per round per instance
rate(zipnet_fold_participants_sum[5m])
/ rate(zipnet_fold_participants_count[5m])
# Aggregator fold saturation (clients dropped by the deadline)
(
rate(zipnet_clients_registered_total[5m])
-
rate(zipnet_fold_participants_sum[5m]) / rate(zipnet_rounds_finalized_total[5m])
)
Logs that should never fire (without a concurrent alert)
rival group leader detectedon any committee server.SubmitAggregate with bad length/SubmitPartial with bad lengthin a committee log.failed to mirror LiveRoundCellpersistently.committee offline — aggregate dropped— either the committee is down or bundle tickets never replicated.
If any of these fire without a concurrent incident, treat it as a protocol invariant break and escalate to the contributor on-call.