Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

audience: all

Zipnet is an anonymous broadcast channel for bounded sets of authenticated participants. A group of clients publish messages onto a shared log; nobody — not even the operators of the infrastructure, acting individually — can tell which client authored which message.

This book documents a working prototype of ZIPNet built as a mosaik-native application. The protocol follows Rosenberg, Shih, Zhao, Wang, Miers, and Zhang (2024) with a small, grep-able set of v1 simplifications tracked in Roadmap to v2.

What zipnet is for

The canonical motivating case is an encrypted mempool: TEE-attested wallets seal transactions and publish them through zipnet; builders read an ordered log of sealed transactions; no party — not even a compromised builder — can link a transaction back to its author until on-chain execution reveals whatever the transaction itself reveals. The encryption layer (threshold decryption, TEE unsealing, plaintext-if-you-want) sits on top; zipnet supplies the anonymous, ordered, sybil-resistant publish channel underneath.

Other deployments in the same shape:

  • Permissioned order-flow auctions. Whitelisted searchers publish intents; builders bid without knowing which searcher sent what.
  • Anonymous governance signalling. Token-holder wallets cast signals a delegate can tally without learning which wallet sent any given one.
  • Private sealed-bid auctions. Bidders publish; outcomes are public; bid-to-bidder linkage is cryptographic.

What zipnet uniquely provides across these:

  • Sender anonymity within an attested set. A compromised reader cannot tie a message back to its author unless every committee operator colludes (any-trust).
  • Shared ordered view. Every subscriber sees the same log in the same order.
  • Sybil resistance. Only TEE-attested clients can publish.
  • Censorship resistance at the publish layer. Readers cannot drop messages from specific authors because authorship is unlinkable.

The deployment model in one paragraph

Zipnet runs as one service among many on a shared mosaik universe — a single NetworkId (zipnet::UNIVERSE) that hosts zipnet alongside other mosaik services (signers, storage, oracles). An operator stands up an instance under a short, namespaced string (e.g. acme.mainnet); multiple instances coexist on the same universe, each with its own committee, ACL, and round parameters. Consumers bind to an instance by name with one line of Rust: Zipnet::bind(&network, "acme.mainnet"). There is no on-network registry; the operator publishes the instance name (and, if TDX-gated, the committee MR_TD) via release notes or docs, and consumers compile it in.

The full rationale is in Designing coexisting systems on mosaik.

Three audiences, three entry points

This book is written for three distinct readers. Every page declares its audience on the first line and respects that audience’s tone. Pick the one that matches you:

See Who this book is for for the tone conventions each audience is held to.

What this prototype is

  • A permissioned, any-trust broadcast system: anonymity is preserved as long as at least one committee server is honest; liveness requires every committee server to be honest (in v1).
  • Real cryptography — X25519 Diffie–Hellman, HKDF-SHA256, AES-128-CTR pad generation, blake3 falsification tags, ed25519 peer signatures (via iroh).
  • Real consensus — the committee runs a modified Raft through mosaik’s Group<CommitteeMachine>.
  • Real networking — the aggregator and the committee communicate through mosaik typed streams; discovery is gossip + pkarr + mDNS; transport is iroh / QUIC.

What this prototype is not

  • A production anonymous broadcast system. Ratcheting, footprint scheduling, cover traffic, multi-tier aggregators, and TDX-only builds tracked in the Roadmap to v2 are all deferred.
  • Byzantine fault tolerant. Mosaik is explicit about this; zipnet inherits the assumption. See Threat model for the precise statement.

Layout of the source tree

crates/
  zipnet            SDK facade (Zipnet::bind, UNIVERSE, instance_id!)
  zipnet-proto      wire types, crypto, XOR
  zipnet-core       Algorithms 1/2/3 as pure functions
  zipnet-node       mosaik integration
  zipnet-client     TEE client binary
  zipnet-aggregator aggregator binary
  zipnet-server     committee server binary
book/               this book

See Crate map for the dependency graph and purity boundaries.

Who this book is for

audience: all

The zipnet book has three audiences. Every chapter declares its audience on the first line (audience: users | operators | contributors | both | all) and respects that audience’s conventions. This page is the authoritative description of each audience and the tone we hold ourselves to. New pages must pick one.

Mixing audiences wastes readers’ time and erodes trust. When content genuinely serves more than one group, use both (users + operators, users + contributors, …) or all, and structure the page so each audience gets the answer it came for in the first paragraph.

Users

Who they are. External Rust developers building their own mosaik agents that publish into — or read from — a running zipnet instance. They do not run committee servers or the aggregator; that is the operator’s job. They are integrators, not protocol implementers.

What they can assume.

  • Comfortable with async Rust and the mosaik book.
  • Already have a mosaik application in mind; zipnet is a dependency, not the centre of their work.
  • They bring their own Arc<Network> and own its lifecycle.

What they do not need.

  • Protocol theory. A user who wants it can follow the link to the contributor pages.
  • An explanation of mosaik primitives. Link the mosaik book instead.
  • A committee operator’s view of keys, rotations, or monitoring.

What they care about.

  • “What do I import?”
  • “How do I bind to the operator’s instance?”
  • “What does the operator owe me out of band — universe, instance name, MR_TD?”
  • “What does an error actually mean when it fires?”

Tone. Code-forward and cookbook-style. Snippets are rust,ignore, self-contained, and meant to be lifted into the reader’s workspace. Public API surfaces are listed as tables. Common pitfalls are called out inline so the reader does not have to infer them from silence. Second person (“you”) throughout.

Canonical user page. Quickstart — publish and read.

Operators

Who they are. Devops staff deploying and maintaining zipnet instances. They run the committee, the aggregator, and the TDX images. They are the ones the users rely on.

What they can assume.

  • Familiar with Linux ops, systemd units, cloud networking, TLS, Prometheus.
  • Comfortable reading logs and dashboards.
  • Not expected to read Rust source. A Rust or protocol detail that is load-bearing for an operational decision belongs in a clearly marked “dev note” aside that can be skipped.

What they do not need.

  • The paper. Link it when a term is inherited; do not re-derive.
  • Internal crate layering. The operator cares what a binary does, not which crate it lives in.
  • Client-side ergonomics. That is the users’ book.

What they care about.

  • “What do I run, on what hardware, with what env vars?”
  • “How do I know it is healthy?”
  • “How do I rotate secrets / retire an instance / upgrade an image?”
  • “What page covers the alert that just fired?”

Tone. Calm, runbook-style. Numbered procedures, parameter tables, one-line shell snippets. Pre-empt the obvious “what if…” questions inline. Avoid “simply” and “just”. Every command should either be safe to run verbatim or clearly marked as needing adaptation.

Canonical operator page. Quickstart — stand up an instance.

Contributors

Who they are. Senior Rust engineers with distributed-systems and cryptography background, extending the protocol or the code, or standing up a new service on mosaik that reuses zipnet’s deployment pattern.

What they can assume.

  • Have read the ZIPNet paper (eprint 2024/1227).
  • Have read the mosaik book and are comfortable with Stream, Group, Collection, TicketValidator, the when() DSL, declare! macros.
  • Comfortable with async Rust, Raft, DC nets.

What they do not need.

  • Re-exposition of the paper. Cite section numbers (e.g. “§3.2”) and move on.
  • Primitives covered in the mosaik book. Link it.
  • User-level ergonomics unless they drive a design choice.

What they care about.

  • “Why is it this shape and not Shape A / B / C / D?”
  • “What invariants must hold? Where are they enforced?”
  • “What breaks when I bump StateMachine::signature()?”
  • “Where do I extend this — which module, which trait, which test?”

Tone. Dense, precise, design-review style. ASCII diagrams, pseudocode, rationale. rust,ignore snippets and structural comparisons without apology.

Canonical contributor page. Designing coexisting systems on mosaik.

Shared writing rules

  • No emojis anywhere in the book or the code.
  • No exclamation marks outside explicit security warnings.
  • Link the paper by section number when inheriting its terminology (e.g. “§3.2 scheduling”), not by paraphrase.
  • Link the mosaik book rather than re-explaining mosaik primitives. Our readers can follow a link.
  • Security-relevant facts are tagged with a visible admonition, not hidden inline.
  • Keep the three quickstarts synchronised. When the public SDK shape, the deployment model, or the naming convention changes, update the users, operators, and contributors quickstarts together, not “this one first, the others later”.

What you need from the operator

audience: users

Before you can write a line of code against a running zipnet deployment, collect two (or three, if it is TDX-gated) items from whoever runs it. That is the whole handshake — zipnet does not gossip an instance registry, so everything you need to reach the deployment has to arrive out of band.

The handshake

#ItemWhat it isWhere it goes in your code
1Instance nameShort namespaced string that names the deployment. Examples: acme.mainnet, preview.alpha, dev.ci-42.Zipnet::bind(&network, "acme.mainnet")
2Bootstrap PeerIdAt least one reachable peer on the shared universe — typically the operator’s aggregator or a committee server. Without one, cold-start discovery falls back to the Mainline DHT and takes minutes instead of seconds.discovery::Config::builder().with_bootstrap(peer_id) on the Network builder.
3Committee MR_TD (TDX-gated deployments only)48-byte hex measurement of the operator’s committee image. Pin this if your agent verifies inbound committee attestation, or match it if you are building a client image.See TEE-gated deployments for which applies to your setup.

The instance name is the one thing that differs between deployments. It fully determines every on-wire ID the SDK uses — committee GroupId, submit StreamId, broadcasts StoreId, ticket class — via a single blake3("zipnet." + instance_name) derivation. If your string disagrees with the operator’s by one character, your code derives IDs nobody is serving, and Zipnet::bind returns Error::ConnectTimeout after the bond window elapses.

The bootstrap peer is universe-level, not zipnet-specific. Any reachable peer on the shared universe is a valid starting point; once you are bonded, mosaik’s discovery finds the specific instance’s committee and aggregator through the shared peer catalog.

The MR_TD is relevant only if the operator has turned on TDX gating. Most development deployments do not; production often does.

What you do not need to ask for

  • The universe NetworkId. It is zipnet::UNIVERSE — a shared constant baked into the SDK. Every operator and every user on zipnet uses the same value. You only need an operator-supplied override in the rare case they run an isolated federation on a different universe; assume they will tell you explicitly if so.
  • Per-instance StreamId / StoreId / GroupId values. The SDK derives all of them from the instance name. Operators never hand these out, and the facade does not accept them.
  • Committee server secrets or any committee member’s X25519 secret. You are a consumer, not a committee member.
  • A seat on the committee’s Raft group. The SDK reads the broadcast log through a replicated collection; it does not vote.

How the handshake travels

Out of band. Release notes, a README in the operator’s repo, a Slack message, a secret-manager entry. Zipnet deliberately does not carry an on-network registry — the shared-universe model assumes consumers compile-time reference the instance name they trust, rather than discovering “what instances exist” at runtime. See Designing coexisting systems on mosaik for the rationale.

Pinning the instance name at compile time

A typo in the instance name silently produces a different UniqueId and surfaces as ConnectTimeout. For production code, bake the name in with the instance_id! macro so typos become build errors:

use zipnet::{Zipnet, UniqueId, UNIVERSE};

const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");

let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;

instance_id!("acme.mainnet") and zipnet::instance_id("acme.mainnet") produce identical bytes, so an operator’s ZIPNET_INSTANCE=acme.mainnet env var and your compile-time constant land on the same UniqueId.

What you bring yourself

  • Your mosaik SecretKey if you want a stable PeerId across restarts. Leave it unset to get a random identity per run, which is the usual choice for anonymous-use-case clients. See Identity.
  • Your message payloads. The SDK does not care what bytes you put in — any impl Into<Vec<u8>>.

Minimal smoke test before writing anything substantial

Once you have the two items (three if TDX-gated), this program publishes to the deployment and prints a receipt within a few round periods:

use std::sync::Arc;
use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let bootstrap = "<paste-the-operator's-peer-id>".parse()?;

    let network = Arc::new(
        Network::builder(UNIVERSE)
            .with_discovery(discovery::Config::builder().with_bootstrap(bootstrap))
            .build()
            .await?,
    );

    let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;
    let receipt = zipnet.publish(b"hello from my laptop").await?;
    println!("landed in round {} slot {}", receipt.round, receipt.slot);
    Ok(())
}

If bind returns ConnectTimeout, the instance name or the bootstrap peer is the first suspect — see Troubleshooting.

Trust

The operator is trusted for liveness — they can stall or kill rounds at will. They are not trusted for anonymity, provided the any-trust assumption holds across their committee. See Threat model if you are auditing before integrating.

Quickstart — publish and read

audience: users

You bring a mosaik::Network; the SDK layers ZIPNet on top of it as one service among many on a shared mosaik universe. Every deployment is identified by an instance name. You bind to the one you want with Zipnet::bind(&network, instance_name).

Why you might want this

You’re building something where a bounded, authenticated set of participants needs to publish messages without revealing which participant sent which. The canonical case is an encrypted mempool: TDX-attested wallets seal transactions and publish them through zipnet; builders read an ordered broadcast log of sealed transactions; nobody — not even a compromised builder — can link a transaction to its sender until on-chain execution reveals whatever the transaction itself reveals. The encryption layer (threshold decryption, TEE unsealing, or none) sits on top; zipnet supplies the anonymous, ordered, sybil-resistant publish channel underneath.

Other deployments in the same shape:

  • Permissioned order-flow auctions. Whitelisted searchers publish intents; builders bid without knowing which searcher sent what.
  • Anonymous governance signalling. Token-holder wallets cast signals a delegate can tally without learning which wallet sent any given one.
  • Private sealed-bid auctions. Bidders publish; outcome is public; bid-to-bidder linkage is cryptographic.

What zipnet uniquely provides across these:

  • Sender anonymity within an attested set. A compromised reader cannot tie a message back to its author unless every committee operator colludes (any-trust).
  • Shared ordered view. Every subscriber sees the same log in the same order. No relay-race asymmetry between readers.
  • Sybil resistance. Only TDX-attested clients can publish.
  • Censorship resistance at the publish layer. Readers can’t drop messages from specific authors because authorship is unlinkable.

If you’re the operator standing up the deployment rather than using one, read the operator quickstart instead.

The one-paragraph mental model

A mosaik universe is a single shared NetworkId. Many services — zipnet, multisig signers, secure storage, oracles — live on it simultaneously. An operator can run any number of instances of zipnet (“mainnet”, “preview.alpha”, “acme-corp”) concurrently on the same universe; each instance has its own committee, its own ACL, its own round parameters, and its own ticket class. You pick the one you want by name — the operator tells you which name to use, and your code bakes it in. No registry lookup, no runtime discovery of “what instances exist”. The same Arc<Network> handle can also bind to other services without needing a second network.

Cargo.toml

[dependencies]
zipnet  = "0.1"
mosaik  = "=0.3.17"
tokio   = { version = "1", features = ["full"] }
futures = "0.3"
anyhow  = "1"

zipnet re-exports mosaik::{Tag, unique_id!} so you rarely reach for mosaik directly in small agents, but you’ll usually keep mosaik as a direct dep since you’re the one owning the Network.

Publisher

use std::sync::Arc;

use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "mainnet").await?;

    let receipt = zipnet.publish(b"hello from my agent").await?;
    println!("landed in round {} slot {}", receipt.round, receipt.slot);

    Ok(())
}

Three lines inside main:

  1. Create a mosaik network on the shared universe NetworkId.
  2. Bind to the mainnet zipnet instance. The SDK resolves the instance salt to concrete stream, collection, and group IDs, installs the client identity, attaches the bundle ticket, and waits until you are in a live round’s roster.
  3. publish resolves after the broadcast finalizes.

UNIVERSE is the shared NetworkId that hosts the deployment. Zipnet exports this constant today; when mosaik ships a canonical universe constant, this value will be re-exported verbatim. See Designing coexisting systems on mosaik for the full rationale.

Subscriber

use std::sync::Arc;

use futures::StreamExt;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "mainnet").await?;

    let mut rounds = zipnet.subscribe().await?;
    while let Some(round) = rounds.next().await {
        for msg in round.messages() {
            println!("round {}: {:?}", round.id(), msg.bytes());
        }
    }
    Ok(())
}

round.messages() yields only payloads that decoded cleanly — falsification-tag verification and collision filtering happen inside the SDK. Reach for round.raw() if you need the BroadcastRecord.

Binding to a testnet, devnet, or tenant instance

Instance names are free-form strings; well-known names are conventions, not types. An operator running a testnet gives you its instance name (e.g. preview.alpha) along with the universe-level bootstrap peers and any required TDX measurement.

use std::sync::Arc;

use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "preview.alpha").await?;

    let _ = zipnet.publish(b"hi from testnet").await?;
    Ok(())
}
Instance nameWhat operators commonly use it for
mainnetProduction deployment, long-lived committee
preview.<tag>Long-lived testnet on a per-tag TDX image
dev.<tag>Per-developer or per-CI-job ephemeral instance
anything elseWhatever the operator tells you

The SDK itself does not dispatch on the name — TDX attestation is controlled by the tee-tdx Cargo feature on the zipnet crate, not by the instance name you pick. The table above is naming convention, not policy.

The instance name is the only piece of zipnet-specific identity the SDK needs. It fully determines the committee GroupId, the submit StreamId, the broadcasts StoreId, and the ticket class — all derived from one salt (see Designing coexisting systems on mosaik).

A typo in the instance name is silent — your code derives different IDs than the operator, no one picks up, and bind returns ConnectTimeout after the bond window elapses. For production, consider pinning the instance as a compile-time UniqueId constant using the instance_id! macro, so a typo is caught at build time:

use zipnet::{Zipnet, UniqueId, UNIVERSE};

const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");

let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;

The instance_id! macro and the runtime instance_id function produce identical bytes for the same name, so the operator’s ZIPNET_INSTANCE=acme.mainnet env var and your compile-time constant land on the same UniqueId.

Sharing one Network across services and instances

Because Zipnet::bind only takes &Arc<Network>, one network handle can simultaneously serve many services and many instances of the same service:

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);

    // two zipnet instances side by side
    let prod    = Zipnet::bind(&network, "mainnet").await?;
    let testnet = Zipnet::bind(&network, "preview.alpha").await?;

    // …and unrelated services on the same network
    // let multisig = Multisig::bind(&network, "treasury").await?;
    // let storage  = Storage::bind(&network,  "archive").await?;

    let _ = prod.publish(b"production message").await?;
    let _ = testnet.publish(b"dry-run message").await?;
    Ok(())
}

Every instance and every service derives its own IDs from its own salt, so they coexist on the shared catalog without collision. You pay for one mosaik endpoint, one DHT record, one gossip loop — not one per service.

Bring-your-own-config

You keep full control of the mosaik builder; the SDK never constructs the Network for you:

use std::{net::SocketAddr, sync::Arc};

use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(
        Network::builder(UNIVERSE)
            .with_mdns_discovery(true)
            .with_discovery(discovery::Config::builder().with_bootstrap(universe_bootstrap_peers()))
            .with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
            .build()
            .await?,
    );

    let zipnet = Zipnet::bind(&network, "mainnet").await?;
    let _ = zipnet.publish(b"hi").await?;
    Ok(())
}

fn universe_bootstrap_peers() -> Vec<mosaik::PeerId> { vec![] }

Bootstrap peers are universe-level, not zipnet-specific. Any reachable peer on the shared network — a mosaik registry node, a friendly operator’s aggregator, your own relay — works as a starting point. Once you’re bonded, Zipnet::bind locates the specific instance’s committee and aggregator via the shared peer catalog.

What you get back

pub struct Receipt {
    pub round:   zipnet::RoundId,
    pub slot:    usize,
    pub outcome: zipnet::Outcome,
}

pub enum Outcome { Landed, Collided, Dropped }

pub struct Round { /* opaque */ }
impl Round {
    pub fn id(&self) -> zipnet::RoundId;
    pub fn messages(&self) -> impl Iterator<Item = zipnet::Message>;
    pub fn raw(&self) -> &zipnet::BroadcastRecord;
}

pub struct Message { /* opaque */ }
impl Message {
    pub fn bytes(&self) -> &[u8];
    pub fn slot(&self) -> usize;
}

Almost every application uses Receipt::outcome and Message::bytes() and ignores the rest.

Error model

pub enum Error {
    WrongUniverse { expected: mosaik::NetworkId, actual: mosaik::NetworkId },
    ConnectTimeout,
    Attestation(String),
    Shutdown,
    Protocol(String),
}

ConnectTimeout is the one you’ll hit in development — usually a typo in the instance name (you’re deriving a GroupId nobody is serving), an unreachable bootstrap peer, or an operator whose committee isn’t up yet. WrongUniverse shows up if your Network was built against a different universe NetworkId than the SDK expects.

Cover traffic is on by default

An idle Zipnet handle sends a cover envelope each round to widen the anonymity set. See Publishing messages for how to tune or disable it.

Shutdown

drop(zipnet);             // fine — the driver task exits cleanly
zipnet.shutdown().await?; // if you want to flush pending publishes first

Dropping one Zipnet handle only shuts that binding down; the Network stays up as long as other handles (or you) hold it. This is the intended pattern when one process talks to several services or several instances.

Next reading

Client identity

audience: users

A zipnet client has two distinct identities that work together. The SDK manages one of them for you; the other you control through the mosaik Network you hand to Zipnet::bind.

Two identities

IdentityTypeWhere it comes fromPurpose
PeerIded25519 public keymosaik / iroh SecretKey on the NetworkAuthenticates you on the wire. Signs your PeerEntry.
Client-side DH identityX25519 keypairGenerated inside Zipnet::bind per bindingNames your slot in the anonymous-broadcast rounds. Binds your pads.

The DH identity is internal to zipnet and not exposed across the SDK surface — you never see a ClientId or DhSecret type in user code. Every call to Zipnet::bind generates a fresh DH keypair, installs the matching bundle ticket through mosaik’s discovery layer, and waits until the committee admits the binding into a live round. When you drop the Zipnet handle, that keypair and its ticket go with it.

Your PeerId is the only identity you materially choose.

Choose your PeerId lifetime

Fully ephemeral (default)

Build the Network without calling with_secret_key. Mosaik picks a random iroh identity per run. Combined with the per-bind DH identity, this means every process run is an unlinkable (PeerId, client-DH) pair.

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

let network = Arc::new(Network::new(UNIVERSE).await?);
let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;

This is the right default for anonymous use cases. An observer correlating PeerIds across rounds learns only “this peer was online during this interval” — which is what mosaik’s transport layer exposes anyway, independent of zipnet.

Stable PeerId, ephemeral client DH identity

Useful when you want a predictable bootstrap target (your agent’s PeerId stays the same across restarts) but you don’t want to be correlatable inside zipnet rounds. Each bind gets a fresh DH keypair regardless of the PeerId.

use std::sync::Arc;
use mosaik::{Network, SecretKey};
use zipnet::{Zipnet, UNIVERSE};

let sk = SecretKey::from_bytes(&my_seed_bytes);
let network = Arc::new(
    Network::builder(UNIVERSE)
        .with_secret_key(sk)
        .build()
        .await?,
);
let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;

The rebind produces a new client DH identity even with the same PeerId, so rounds stay unlinkable at the zipnet layer. If you hold one Zipnet handle for a long time and publish many messages, those messages share one client DH identity and are linkable to each other. To rotate, drop the handle and call bind again.

Stable everything (rare)

The current SDK does not expose a way to persist the per-binding DH identity across restarts. If you need stable client identity for a reputation or allowlist use case, talk to the operator about attested-client TDX features — see TEE-gated deployments. Stable anonymous-publish identity at the application layer is an anti-pattern: it trivially breaks unlinkability across rounds.

Multiple bindings per process

Zipnet::bind only borrows the Arc<Network>, so one network can host many bindings — the same instance many times, different instances side by side, or zipnet alongside other mosaik services:

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

let network  = Arc::new(Network::new(UNIVERSE).await?);
let prod     = Zipnet::bind(&network, "acme.mainnet").await?;
let testnet  = Zipnet::bind(&network, "preview.alpha").await?;
let prod_bis = Zipnet::bind(&network, "acme.mainnet").await?;

prod and prod_bis have the same PeerId but independent client DH identities; the committee treats them as two distinct publishers. This is occasionally useful for widening your own anonymity set in test deployments, but it does not buy you extra anonymity in production against a global observer watching your network interface.

Rotating

Drop the Zipnet handle and call bind again:

drop(prod);
let prod = Zipnet::bind(&network, "acme.mainnet").await?;

drop tears down the driver task, removes the bundle ticket from discovery, and lets the committee’s roster forget the old DH identity at the next gossip cycle. The next bind starts clean.

If you want to flush pending publishes before dropping, prefer zipnet.shutdown().await? — see Publishing.

What about the peer catalog?

The mosaik peer catalog — network.discovery().catalog() — lists every peer zipnet and anything else on the shared universe sees. It is not zipnet-specific, and the SDK does not ask you to interact with it. If you need to inspect it for debugging, see the mosaik book on discovery.

Publishing messages

audience: users

Everything about getting a payload into a finalized broadcast round.

The whole surface

impl Zipnet {
    pub async fn publish(&self, payload: impl Into<Vec<u8>>) -> Result<Receipt>;
}

pub struct Receipt {
    pub round:   zipnet::RoundId,
    pub slot:    usize,
    pub outcome: zipnet::Outcome,
}

pub enum Outcome { Landed, Collided, Dropped }

One call per message. publish resolves after the round carrying the payload finalizes — not when the envelope was accepted by the aggregator. The Receipt tells you what actually happened.

Fire-and-forget

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;

    let _ = zipnet.publish(b"hello").await?;
    Ok(())
}

If you don’t care whether the message landed, colliding, or was dropped, discard the Receipt. Many applications do exactly this — the encryption or ordering layer built on top replays lost messages at the application level.

Inspecting the outcome

use zipnet::Outcome;

let receipt = zipnet.publish(payload).await?;
match receipt.outcome {
    Outcome::Landed   => tracing::info!(round = %receipt.round, "published"),
    Outcome::Collided => {
        // Another client hashed to the same slot this round. Both
        // payloads are XOR-corrupted. Retry on the next round.
        tracing::warn!(round = %receipt.round, "collision, retrying");
        // … call publish again with the same payload …
    }
    Outcome::Dropped  => {
        // The aggregator never forwarded the envelope into a
        // committed aggregate. Usually transient — the aggregator
        // was offline or our registration hadn't propagated yet.
        tracing::warn!(round = %receipt.round, "dropped, retrying");
    }
}

Landed is the happy path. Under default parameters and a small active set, most rounds produce Landed for everyone.

Retry policy

Zipnet does not retry for you. If you need at-least-once delivery at the application layer, wrap publish in your own loop:

use zipnet::{Outcome, Zipnet};

async fn publish_with_retry(z: &Zipnet, payload: Vec<u8>, attempts: u32)
    -> zipnet::Result<zipnet::Receipt>
{
    for _ in 0..attempts {
        let receipt = z.publish(payload.clone()).await?;
        if matches!(receipt.outcome, Outcome::Landed) {
            return Ok(receipt);
        }
    }
    // Return the last receipt (probably Collided/Dropped), or a
    // protocol-level error if you'd rather surface that instead.
    z.publish(payload).await
}

Retry latency is bounded by the round cadence of the deployment — at the default ~2 s round period, three attempts cost up to ~6 s. Tune attempts to your SLA.

Payload budget

The SDK accepts impl Into<Vec<u8>>. Internally, a payload that exceeds the round’s per-slot budget is rejected; payloads that fit are zero-padded into their slot. Current default round parameters give you 240 bytes of application payload per publish. For larger messages, split at the application layer — the protocol does not frame for you.

If your deployment uses non-default parameters, the operator will tell you the budget. The SDK surfaces oversized payloads as Error::Protocol("payload too large: …"); see Troubleshooting.

Cover traffic is on by default

An idle Zipnet handle sends a cover envelope each round to widen the anonymity set. Cover envelopes do not show up as publish calls on your side — they are generated automatically by the binding’s driver task. Observers cannot distinguish a cover round from a real-payload round for any given participant.

There is no SDK knob to tune cover-traffic rate today. If you hold a Zipnet handle, you participate; if you drop it, you don’t. For applications that want to only appear for certain rounds, bind immediately before you need to publish and drop immediately after — see Identity.

Parallel publishes on one handle

Zipnet is Clone and internally Arc-wrapped. Concurrent publish calls on one handle are fine; the driver serializes them per-round and emits at most one payload per round per binding. If you call publish twice during the same round window, the second call waits for the next round rather than sharing the slot.

let z = Zipnet::bind(&network, "acme.mainnet").await?;
let a = z.clone();
let b = z.clone();

let (ra, rb) = tokio::join!(
    a.publish(b"message A"),
    b.publish(b"message B"),
);
// ra and rb come from different rounds.

If you need higher throughput per wall-clock second, the right lever is operator-side round cadence or num_slots. From the SDK, one binding is one slot per round.

Shutdown

drop(zipnet);               // fine — the driver exits cleanly, in-flight publishes may be lost
zipnet.shutdown().await?;   // waits for in-flight receipts, then tears down

shutdown returns Error::Shutdown if the binding was already closing. Otherwise the call resolves once pending publishes have either landed or been marked Dropped. Use it in application-level shutdown paths where losing a trailing publish would be surprising.

Dropping one Zipnet handle does not tear down the Network. Other services or other zipnet instances sharing the same Arc<Network> keep running.

Reading the broadcast log

audience: users

Zipnet::subscribe returns a stream of finalized rounds. Every subscriber sees the same log in the same order.

The whole surface

impl Zipnet {
    pub async fn subscribe(&self) -> Result<BroadcastStream>;
}

// BroadcastStream implements futures::Stream<Item = Round>.

pub struct Round { /* opaque */ }
impl Round {
    pub fn id(&self) -> zipnet::RoundId;
    pub fn messages(&self) -> impl Iterator<Item = Message>;
    pub fn raw(&self) -> &zipnet::BroadcastRecord;
}

pub struct Message { /* opaque */ }
impl Message {
    pub fn bytes(&self) -> &[u8];
    pub fn slot(&self) -> usize;
}

One call per subscriber. Every call to subscribe returns a fresh receiver; handles are cheap.

Tail the log as it grows

use std::sync::Arc;
use futures::StreamExt;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;

    let mut rounds = zipnet.subscribe().await?;
    while let Some(round) = rounds.next().await {
        for msg in round.messages() {
            println!("round {}: {:?}", round.id(), msg.bytes());
        }
    }
    Ok(())
}

round.messages() yields only payloads that decoded cleanly — falsification-tag verification and collision filtering happen inside the SDK. You see the application bytes the publisher actually sealed, not the raw slot bytes.

round.id() is monotonically increasing. Consecutive items from the stream have strictly increasing ids under normal operation.

Wait for a specific round

use futures::StreamExt;
use zipnet::{Zipnet, RoundId};

async fn wait_for_round(
    zipnet: &Zipnet,
    target: RoundId,
) -> anyhow::Result<zipnet::Round> {
    let mut rounds = zipnet.subscribe().await?;
    while let Some(round) = rounds.next().await {
        if round.id() == target {
            return Ok(round);
        }
        if round.id() > target {
            anyhow::bail!("round {target} is already in the past");
        }
    }
    anyhow::bail!("stream closed before round {target}")
}

If you subscribe after the round you care about has already finalized, the stream will skip past it — it only yields rounds that finalize after subscribe returns. Keep your subscription open if you care about a specific future round.

Gap detection and catch-up

A fresh subscription begins from whatever the committee finalizes next. Earlier rounds are not replayed. If you need the full history, open the subscription before you publish anything and buffer yourself.

If the subscriber falls behind — usually because your round handler is slower than the round cadence — the SDK’s internal broadcast channel lags. You see this as a round id gap: one call to rounds.next().await returns round N, the next returns round N + k for some k > 1. The lost rounds are gone; the SDK does not backfill them. The fix is to make the handler non-blocking — offload heavy work to a separate task:

use futures::StreamExt;
use tokio::sync::mpsc;

let mut rounds = zipnet.subscribe().await?;
let (tx, mut rx) = mpsc::channel(1024);

// Producer: drain the SDK stream as fast as it delivers.
tokio::spawn(async move {
    while let Some(round) = rounds.next().await {
        if tx.send(round).await.is_err() {
            break;
        }
    }
});

// Consumer: heavy per-round work that can tolerate small bursts.
while let Some(round) = rx.recv().await {
    handle(round).await;
}

With this shape, the SDK’s internal buffer drains continuously; the bounded channel between tasks is the one that can fill up, and you control its size.

Raw access

Round::messages() hides everything zipnet-specific — which client occupied which slot, how the broadcast vector was laid out, server roster for the round. When you need the underlying BroadcastRecord, reach for raw():

use zipnet::BroadcastRecord;

while let Some(round) = rounds.next().await {
    let rec: &BroadcastRecord = round.raw();
    tracing::debug!(
        round = %rec.round,
        n_participants = rec.participants.len(),
        n_servers = rec.servers.len(),
        broadcast_bytes = rec.broadcast.len(),
    );
}

BroadcastRecord is a public type from zipnet-proto. Most applications never need it — the hidden-behind-messages() decode pipeline is what you want.

Multiple subscribers

One Zipnet handle can produce many subscribers:

let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
let mut rounds_a = zipnet.subscribe().await?;
let mut rounds_b = zipnet.subscribe().await?;

Both receive the same rounds in the same order. Independent lag: slowing down subscriber A does not affect subscriber B.

Shutdown

Dropping the stream is enough. The SDK’s driver keeps running as long as the Zipnet handle lives; the next subscribe call gives you a fresh stream from the then-current point in the log.

Connecting to the universe

audience: users

The nuts and bolts of building the Arc<Network> that Zipnet::bind attaches to. The zipnet SDK never constructs the network for you — this is intentional. One network can host zipnet alongside other mosaik services on the shared universe, and you own its lifetime.

The minimum

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(Network::new(UNIVERSE).await?);
    let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;

    let _ = zipnet.publish(b"hello").await?;
    Ok(())
}

Network::new(UNIVERSE) produces a network with default mosaik settings — random SecretKey, mDNS off, no bootstrap peers, no prometheus endpoint. Enough for local integration tests; rarely enough for a real deployment.

Bring your own builder

For anything beyond a local experiment, use Network::builder:

use std::{net::SocketAddr, sync::Arc};
use mosaik::{Network, discovery};
use zipnet::{Zipnet, UNIVERSE};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let network = Arc::new(
        Network::builder(UNIVERSE)
            .with_mdns_discovery(true)
            .with_discovery(
                discovery::Config::builder()
                    .with_bootstrap(universe_bootstrap_peers()),
            )
            .with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
            .build()
            .await?,
    );

    let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;
    let _ = zipnet.publish(b"hi").await?;
    Ok(())
}

fn universe_bootstrap_peers() -> Vec<mosaik::PeerId> { vec![] }

Every argument above is a mosaik concern, not a zipnet one. The full builder reference lives in the mosaik book. The rest of this page covers the fields that matter most for a zipnet user.

UNIVERSE

zipnet::UNIVERSE is the shared NetworkId every zipnet deployment lives on. Today it is mosaik::unique_id!("mosaik.universe"). When mosaik ships its own canonical universe constant, this value will be re-exported verbatim.

If your Network is on a different NetworkId, Zipnet::bind rejects it with Error::WrongUniverse { expected, actual } before any I/O happens. There is no way to tunnel zipnet over a non-universe network; the SDK hard-checks this.

Bootstrap peers

Universe-level, not zipnet-specific. Any reachable peer on the shared universe works as a bootstrap — a mosaik registry node, a friendly operator’s aggregator, your own persistent relay. The operator does not typically hand out zipnet-instance-specific bootstrap peers; they publish one set of universe bootstraps that their zipnet instance (and any other services they host) joins through.

Once your network is bonded to the universe, Zipnet::bind finds the specific instance’s committee through the shared peer catalog — you do not need to know anything zipnet-specific at network-builder time.

use mosaik::discovery;
use zipnet::UNIVERSE;

let network = mosaik::Network::builder(UNIVERSE)
    .with_discovery(
        discovery::Config::builder()
            .with_bootstrap(vec![
                // universe-level bootstrap peer IDs, operator-supplied
            ]),
    )
    .build()
    .await?;

On first connect with no bootstrap peers you fall back to the DHT. That works, but it is slow (tens of seconds on a cold start). At least one bootstrap peer is a practical requirement for anything beyond local tests.

mDNS

.with_mdns_discovery(true) collapses discovery latency from minutes to seconds on a shared LAN and is harmless elsewhere. Turn it off only if your security posture forbids advertising peers over mDNS.

Secret key

Omit .with_secret_key(...) for a fresh iroh identity per run. Set a stable SecretKey if you want a predictable PeerId across restarts. See Client identity for when each is appropriate.

Reaching the universe from behind NAT

iroh handles NAT traversal through its relay infrastructure. Most residential and office setups need no extra configuration. Things that help when they don’t:

  • Outbound UDP must be allowed. Iroh uses QUIC on UDP.
  • Full-cone NAT or better is easy. Symmetric NAT falls back to relay — still works, with extra latency.
  • UDP-terminating proxies break iroh. Run the agent from a host with raw outbound UDP.

At startup the network logs its relay choice:

relay-actor: home is now relay https://euc1-1.relay.n0.iroh-canary.iroh.link./

Repeated “Failed to connect to relay server” warnings mean your outbound path is broken; discovery mostly still works via DHT, just slow.

Observability for your own agent

use std::{net::SocketAddr, sync::Arc};
use mosaik::Network;
use zipnet::UNIVERSE;

let network = Arc::new(
    Network::builder(UNIVERSE)
        .with_prometheus_addr("127.0.0.1:9100".parse::<SocketAddr>()?)
        .build()
        .await?,
);

Then scrape http://127.0.0.1:9100/metrics — you’ll get mosaik’s metrics plus whatever you emit with the metrics crate. The zipnet SDK does not expose its own top-level metrics endpoint; observability is the network’s job.

One network, many services and instances

Because Zipnet::bind only borrows &Arc<Network>, you pay for one mosaik endpoint across every service and instance you bind:

use std::sync::Arc;
use mosaik::Network;
use zipnet::{Zipnet, UNIVERSE};

let network  = Arc::new(Network::new(UNIVERSE).await?);

let prod     = Zipnet::bind(&network, "acme.mainnet").await?;
let testnet  = Zipnet::bind(&network, "preview.alpha").await?;
// let multisig = Multisig::bind(&network, "treasury").await?;  // hypothetical
// let storage  = Storage::bind(&network,  "archive").await?;   // hypothetical

Each binding derives its own IDs from its own salt, so they coexist on the shared peer catalog without collision. One UDP socket, one DHT record, one gossip loop.

Graceful shutdown

drop(network);

drop cancels everything — open streams, collection readers, bonds. Mosaik emits a gossip departure so the operator’s logs show you leaving cleanly. If you want to flush pending zipnet publishes first, call zipnet.shutdown().await? on each binding before dropping the network. See Publishing — Shutdown.

Cold-start checklist

If your agent starts but Zipnet::bind returns ConnectTimeout:

  1. The Arc<Network> is on UNIVERSE. If you see WrongUniverse instead, the network was built against a different NetworkId. Switch back to UNIVERSE.
  2. The instance name matches the operator’s exactly. Typos surface as ConnectTimeout, not InstanceNotFound. Consider pinning via zipnet::instance_id!("name") so the name is checked at build time.
  3. Bootstrap PeerIds are reachable. nc -zv <their_host> or whatever the operator tells you to test.
  4. Outbound UDP is allowed. iperf over UDP to a public host.
  5. Your mosaik version matches (=0.3.17). Any minor-version drift changes wire formats.

If none of these resolves it, see Troubleshooting.

TEE-gated deployments

audience: users

Some zipnet deployments require every participant — committee members and publishing clients — to run inside a TDX enclave whose measurement matches the operator’s expected MR_TD. This chapter covers the user side of that setup.

Is the deployment TEE-gated?

Ask the operator. Specifically:

  • Does the committee stack a Tdx validator on its admission tickets?
  • If so, what MR_TD must your client image report?

If the answer to the first question is no, skip this chapter — the rest of the user guide applies unchanged.

How the SDK decides whether to attest

TDX is a Cargo feature on the zipnet crate, not a function of the instance name:

  • tee-tdx disabled (default). The SDK runs a mocked attestation path. Your PeerEntry does not carry a TDX quote. A TDX-gated operator’s committee rejects you at bond time — you see Error::ConnectTimeout (the rejection is silent at the discovery layer) or Error::Attestation if the operator has enabled a stricter surfacing mode.
  • tee-tdx enabled. Zipnet::bind uses mosaik’s real TDX path to generate a quote bound to your current PeerId and attach it to your discovery entry. The committee validates the quote before admitting you.
# Cargo.toml for a user-side agent that must attest.
[dependencies]
zipnet = { version = "0.1", features = ["tee-tdx"] }

With the feature on, your binary only runs correctly inside a real TDX guest. The TDX hardware refuses to quote from a non-TDX machine, so bind surfaces that as Error::Attestation("…").

Build-time: produce a TDX image

Add mosaik’s TDX builder to your crate:

[build-dependencies]
mosaik = { version = "=0.3.17", features = ["tdx-builder-alpine"] }
# or: features = ["tdx-builder-ubuntu"]

build.rs:

fn main() {
    mosaik::tee::tdx::build::alpine().build();
}

This produces a bootable TDX guest image at target/<profile>/tdx-artifacts/<crate>/alpine/ plus a precomputed <crate>-mrtd.hex. The operator either uses your MR_TD as their expected value, or — if they pin a specific image — hands you theirs and you rebuild to match.

The mosaik TDX reference covers Alpine vs Ubuntu trade-offs, SSH and kernel customization, and environment-variable overrides.

The operator → user handshake for TDX

A TDX-gated deployment adds one item to the three-item handshake in What you need from the operator:

ItemWhat it is
Committee MR_TDThe 48-byte hex measurement the operator’s committee images use.

The operator hands this out via their release notes, not via the wire. The zipnet SDK does not bake per-instance MR_TD mappings in — there is no table of “acme.mainnet requires MR_TD abc…” inside the crate. Keeping that mapping client-side is the operator’s responsibility, published out of band.

When the operator rotates the image, your old quote stops validating; the fix is to rebuild with the new MR_TD and redeploy. There is no auto-discovery of acceptable measurements on the wire.

Multi-variant deployments

During a rollout, an operator may accept multiple client MR_TDs simultaneously — usually the old and the new during a staged migration. You only need to match one of them. The precomputed hex files in target/<profile>/tdx-artifacts/<crate>/.../ tell you what your image reports; compare against the list the operator publishes.

Sealing secrets inside the enclave

Zipnet’s current SDK does not expose a sealed-storage helper — each Zipnet::bind generates a fresh per-binding DH identity in process memory. That is fine for the default anonymous-use-case model, where identity is meant to rotate.

If you need stable identity across enclave reboots for a reputation use case, you will need to persist state to TDX sealed storage yourself today. That is out of scope for the SDK and likely to land as a mosaik primitive rather than a zipnet feature; watch the mosaik release notes.

Falling back to non-TDX for development

If you’re writing integration tests and don’t want a TDX VM in the loop, build without the tee-tdx feature and use a deployment whose operator has disabled TDX gating. Typical arrangement:

  • Production and staging: tee-tdx on both sides.
  • Local dev / CI: tee-tdx off on both sides.

The operator runs the dev instance without the Tdx validator on committee admissions; you build your client without the tee-tdx feature. Both sides’ mocks line up.

Failure modes

The error the SDK surfaces when TDX is involved is Error::Attestation(String). Common causes:

  • You built with tee-tdx but aren’t running inside a TDX guest (hardware refuses to quote).
  • Your MR_TD differs from the operator’s. Rebuild with their image.
  • The operator rotated MR_TD and you haven’t. Rebuild.

ConnectTimeout can also stem from TDX mismatches on deployments that surface attestation failures silently at the bond layer; see Troubleshooting.

Troubleshooting from the user side

audience: users

Failure modes you can observe from your own agent, mapped to the SDK’s error enum and the fastest check for each.

The error enum

pub enum Error {
    WrongUniverse { expected: mosaik::NetworkId, actual: mosaik::NetworkId },
    ConnectTimeout,
    Attestation(String),
    Shutdown,
    Protocol(String),
}

Five variants. The two you will hit most in development are ConnectTimeout and WrongUniverse. Everything else is either a real runtime condition or lower-level plumbing surfaced through Protocol.

Symptom: bind returns ConnectTimeout

This is the single most common dev-time error. It means the SDK could not bond to a peer serving your instance within the connect deadline. In descending order of likelihood:

1. Typo in the instance name

Your code derives UniqueIds from "zipnet." + instance_name via blake3. A one-character change produces a completely different id, and nobody is serving it.

Fix: double-check the name against the operator’s handoff. Prefer pinning it as a compile-time constant so typos become build errors:

use zipnet::{Zipnet, UniqueId, UNIVERSE};

const ACME_MAINNET: UniqueId = zipnet::instance_id!("acme.mainnet");

let zipnet = Zipnet::bind_by_id(&network, ACME_MAINNET).await?;

2. Operator’s committee isn’t up

The name is right, but nobody is currently serving it. The SDK cannot distinguish “nobody serves this” from “operator isn’t up yet” without an on-network registry — both surface as ConnectTimeout.

Fix: ask the operator whether the deployment is live.

3. Bootstrap peers unreachable

Even if the instance name is right and the committee is up, your network never bonded to the universe — so it never found the committee. Usually shows up alongside no peer-catalog growth.

Fix: check the bootstrap peer list. See Connecting — Cold-start checklist.

4. TDX posture mismatch

Silent rejection at the bond layer from a TDX-gated deployment often looks like ConnectTimeout rather than a clear Attestation error. Common when your client is built without the tee-tdx feature against a TDX-gated operator.

Fix: see TEE-gated deployments.

Symptom: bind returns WrongUniverse

Your Arc<Network> was built against a different NetworkId than zipnet::UNIVERSE. The error payload tells you both values:

match zipnet::Zipnet::bind(&network, "acme.mainnet").await {
    Err(zipnet::Error::WrongUniverse { expected, actual }) => {
        tracing::error!(%expected, %actual, "network on wrong universe");
    }
    …
}

Fix: build the network with Network::new(UNIVERSE) or Network::builder(UNIVERSE). There is no way to tunnel zipnet over a non-universe network.

Symptom: bind returns Attestation

TDX attestation failed. The string payload names the specific failure from the mosaik TDX stack.

Common causes:

  • You built with tee-tdx but aren’t running inside a TDX guest.
  • Your MR_TD differs from the operator’s expected value (fresh image you haven’t rebuilt, or operator rotated).
  • Your quote has expired.

See TEE-gated deployments.

Symptom: publish returns Outcome::Collided

Another client hashed to the same slot this round. Both payloads get XOR-corrupted; no observable message lands for either of you.

Fix: retry on the next round. See Publishing — Retry policy.

Persistent collisions are a signal that the deployment is oversubscribed for its num_slots — an operator-side tuning problem, not a user one. Collision probability per pair per round is 1 / num_slots; for N clients the expected number of collisions per round is C(N, 2) / num_slots.

Symptom: publish returns Outcome::Dropped

The aggregator never forwarded your envelope into a committed aggregate. Usually transient:

  • Aggregator was offline that round.
  • Your registration hadn’t propagated yet (first few seconds after bind).

Fix: retry. Repeated Dropped across many rounds means the aggregator is unreachable from you — check the peer catalog and bootstrap peers, then contact the operator.

Symptom: subscription sees no new rounds for a long time

Two possibilities:

1. The committee is stuck

The cluster is not finalizing rounds. Contact the operator.

2. Your binding hasn’t caught up yet

bind waits for the first live round roster before returning, so once you have a Zipnet handle, round delivery should start at the next round boundary. If it does not, you are not reaching the broadcast collection’s group — same checks as for ConnectTimeout (bootstrap, UDP egress, TDX).

Symptom: publish or subscribe returns Shutdown

The binding is closing. Either you called shutdown(), dropped every clone of the handle, or the underlying Network went down.

Fix: shutdown is idempotent-ish — further calls keep returning Shutdown. If this is unexpected, check that the Arc<Network> is still alive and that no other part of your code called shutdown on the handle.

Symptom: Error::Protocol(…) with an opaque string

The SDK bubbled up a lower-level mosaik or zipnet-protocol failure. The string content is for humans — do not pattern-match on it.

Fix: enable verbose logging and inspect the mosaik-layer event stream:

RUST_LOG=info,zipnet=debug,mosaik=info cargo run

If the root cause is in mosaik, the mosaik book has better diagnostics than this page can. Open a zipnet issue with the log excerpt if the failure looks zipnet-specific.

Symptom: subscriber lags and misses rounds

Your round handler is slower than the deployment’s round cadence. Internal broadcast channels drop rounds rather than stall the SDK, so you see gaps in round.id().

Fix: offload heavy per-round work to a separate task. See Reading — Gap detection and catch-up.

Symptom: my client compiled against one version, the operator upgraded

Mosaik pinned to =0.3.17 on both sides; zipnet and zipnet-proto baselines must also match the deployment. If WIRE_VERSION or round-parameter defaults change, your client derives different internal IDs and bind returns ConnectTimeout.

Fix: keep your zipnet dep version aligned with the operator’s release notes. Mosaik stays pinned.

When to escalate to the operator

  • bind consistently fails with ConnectTimeout after the name, bootstrap, and universe have all been verified.
  • publish keeps returning Outcome::Dropped across many rounds.
  • Your subscription opens but sees no rounds finalize over several round periods.

When you escalate, include:

  • Your mosaik version (=0.3.17) and zipnet SDK version.
  • The instance name you are binding to.
  • Whether you built with tee-tdx and, if so, your client’s MR_TD.
  • A 60-second log excerpt at RUST_LOG=info,zipnet=debug,mosaik=info.

API reference

audience: users

A compact reference of the surface the zipnet facade crate exposes. Link-in-book pages cover the “how”; this page is the “what”.

The whole import story

Almost every user-side agent pulls from exactly one module:

use zipnet::{
    // The universe constant.
    UNIVERSE,

    // The handle and its stream type.
    Zipnet, BroadcastStream,

    // Identifiers and macros.
    UniqueId, NetworkId, Tag, unique_id, instance_id, instance_id as _,

    // Value types returned by publish / subscribe.
    Receipt, Outcome, Round, Message,

    // Protocol types re-exported from zipnet-proto.
    BroadcastRecord, RoundId,

    // Error model.
    Error, Result,
};

The instance_id! macro is re-exported at the crate root via #[macro_export], so zipnet::instance_id!("name") works alongside the runtime zipnet::instance_id(name) function.

Constants

ItemTypeRole
zipnet::UNIVERSENetworkIdThe shared mosaik universe every zipnet deployment lives on. Build your Network against it.

Handle

#[derive(Clone)]
pub struct Zipnet { /* opaque */ }

Cloneable; all clones share one driver task. Drop every clone or call shutdown to tear down a binding.

MethodReturnsPurpose
Zipnet::bind(&Arc<Network>, &str)Result<Self>Bind by instance name.
Zipnet::bind_by_id(&Arc<Network>, UniqueId)Result<Self>Bind by pre-derived id (use with instance_id!).
.publish(impl Into<Vec<u8>>)Result<Receipt>Publish a payload; resolves after the carrying round finalizes.
.subscribe()Result<BroadcastStream>Stream of finalized rounds.
.shutdown()Result<()>Flush in-flight publishes and tear down.

See Publishing and Reading for usage patterns.

Identifier helpers

pub fn zipnet::instance_id(name: &str) -> UniqueId;
// macro:
pub macro zipnet::instance_id($name:literal) { /* compile-time */ }

Both produce identical bytes — blake3("zipnet." + name). Prefer the macro when the name is a literal so typos fail at build time.

Value types

pub struct Receipt {
    pub round:   RoundId,
    pub slot:    usize,
    pub outcome: Outcome,
}

pub enum Outcome {
    Landed,   // happy path
    Collided, // slot collision; retry next round
    Dropped,  // aggregator never forwarded; retry
}

pub struct Round { /* opaque */ }
impl Round {
    pub fn id(&self) -> RoundId;
    pub fn messages(&self) -> impl Iterator<Item = Message> + '_;
    pub fn raw(&self) -> &BroadcastRecord;
}

pub struct Message { /* opaque */ }
impl Message {
    pub fn bytes(&self) -> &[u8];
    pub fn slot(&self) -> usize;
}

pub struct BroadcastStream;
impl futures::Stream for BroadcastStream {
    type Item = Round;
}

Round::messages() yields only slots that decoded cleanly — malformed or colliding slots are filtered out inside the SDK. Round::raw() escapes to the underlying BroadcastRecord for the rare case you need it.

Errors

pub type Result<T, E = Error> = core::result::Result<T, E>;

#[derive(Debug, thiserror::Error)]
pub enum Error {
    WrongUniverse { expected: NetworkId, actual: NetworkId },
    ConnectTimeout,
    Attestation(String),
    Shutdown,
    Protocol(String),
}

See Troubleshooting for a per-variant diagnostic checklist.

Re-exports from mosaik

ItemFromUse
UniqueIdmosaik::UniqueIdAlias for 32-byte intent-addressed identifiers.
NetworkIdmosaik::NetworkIdType of UNIVERSE and WrongUniverse fields.
Tagmosaik::TagPeer-catalog tag type. Rarely needed directly.
unique_id!mosaik::unique_id!Compile-time UniqueId construction.

Re-exports from zipnet-proto

ItemRole
BroadcastRecordThe finalized round record inside a Round.
RoundIdMonotonic round counter; RoundId::next() to advance.

What you do NOT import

  • zipnet_node::* — committee and role internals. Users do not construct CommitteeMachines or run committee Raft groups.
  • mosaik::groups::GroupKey — you do not have committee secrets.
  • Any raw StreamId / StoreId / GroupId — the SDK derives them from the instance name. Do not try to pin them yourself.

If you find yourself reaching for these, you are probably writing an operator or contributor concern. Revisit What you need from the operator.

Version compatibility

DependencyVersionNote
mosaik=0.3.17Pin exactly; minor versions change wire formats.
zipnetfollow the deployment’s release notesKeep in lockstep with the operator’s version.
tokio1.xAny compatible minor.
futures0.3For StreamExt::next on BroadcastStream.

When the operator announces a deployment upgrade, they should publish the zipnet version to use. Users rebuild and redeploy in lockstep.

Deployment overview

audience: operators

A zipnet deployment runs as one service among many on a shared mosaik universe — a single NetworkId that hosts zipnet alongside other mosaik services. What you stand up is an instance of zipnet under a short, namespaced name you pick (e.g. acme.mainnet). Multiple instances coexist on the same universe concurrently, each with its own committee, ACL, round parameters, and committee MR_TD.

If you haven’t yet, read the Quickstart — it walks you end-to-end from a fresh checkout to a live instance. This page gives the architectural background the runbooks later in this section refer back to.

The shared universe model

  • The universe constant is zipnet::UNIVERSE = unique_id!("mosaik.universe"). Override only for an isolated federation via ZIPNET_UNIVERSE; in the common case, leave it alone.
  • All your nodes — committee servers, aggregator, clients — join that same universe. Mosaik’s standard peer discovery (/mosaik/announce gossip plus the Mainline DHT bootstrap) handles reachability. You don’t configure streams, groups, or IDs by hand.
  • The instance is identified by ZIPNET_INSTANCE (e.g. acme.mainnet). Every sub-ID — committee GroupId, submit StreamId, broadcasts StoreId — is derived from that name, so typos surface as ConnectTimeout rather than a config error.

Publishers bond to your instance knowing only three things: the universe NetworkId, the instance name, and (for TDX-gated deployments) your committee MR_TD. You hand those out in release notes or docs; there is no on-network registry to publish to and nothing to advertise.

Three node roles

A zipnet deployment has three kinds of nodes. You — the operator — will run at least the first two. The third is optional (most publishers are external users running their own clients).

RoleCountTrust statusResource profile
Committee server3 or more (odd)any-trust: at least one must be honest for anonymity; all must be up for liveness in v1low CPU, modest RAM, stable identity, low churn
Aggregator1 (v1)untrusted for anonymity, trusted for livenesshigher CPU + bandwidth, can churn
Publishing clientmanyTDX-attested in production; untrusted for livenessephemeral; any churn is tolerated

What every node needs

  • Outbound UDP to the internet (iroh / QUIC transport) and to mosaik relays.
  • A few MB of RAM; committee servers need more during large-round replay.
  • A clock within a few seconds of the rest of the universe (Raft tolerates skew but not arbitrary drift).
  • ZIPNET_INSTANCE=<name> set to the same instance name on every node in that deployment.

What only committee servers need

  • A stable PeerId across restarts. Set ZIPNET_SECRET to any string — it is hashed with blake3 to derive the node’s long-term iroh identity. Rotating it invalidates every bond.
  • Access to the shared committee secret, passed as ZIPNET_COMMITTEE_SECRET. This gates admission to the Raft group. Distribute it out of band (vault, secrets manager, k8s secret). Anyone holding it can join the committee — treat it like a root credential.
  • In production, a TDX host. Mosaik ships the TDX image builder; you call mosaik::tee::tdx::build::ubuntu() from your build.rs and get a launch script, initramfs, OVMF, and a precomputed MR_TD at build time. See the Quickstart’s TDX section.
  • Durable storage is not required in v1 (state is in memory). A restarted server rejoins and catches up by snapshot.

What only aggregators need

  • More network bandwidth than committee servers. The aggregator receives every client envelope and emits a single aggregate per round.
  • A stable PeerId is strongly recommended — clients often use the aggregator as a discovery bootstrap.
  • The aggregator does not need the committee secret. It is untrusted for anonymity.

What only clients need

  • The universe NetworkId, instance name, and (for TDX-gated instances) your committee MR_TD. That is the whole handshake.
  • A TDX host if the instance is TDX-gated. See Security posture checklist.

How the three talk

   clients ── ClientEnvelope stream ─────► aggregator
                                               │
                                 AggregateEnvelope stream
                                               │
                                               ▼
                                        committee servers
                                               │
                                     Raft-replicated apply
                                               │
                                               ▼
                              Broadcasts collection (readable by anyone)

Clients and the aggregator are not members of the committee’s Raft group; they observe the final broadcasts through a replicated collection.

Minimum viable deployment

Three committee servers + one aggregator + a handful of clients is the smallest deployment where anonymity holds meaningfully. Two committee servers will technically run but any one of them can deanonymize the set — stick to three or more.

     TDX host A           TDX host B           TDX host C
   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
   │ zipnet-     │      │ zipnet-     │      │ zipnet-     │
   │ server #1   │      │ server #2   │      │ server #3   │
   └──────┬──────┘      └──────┬──────┘      └──────┬──────┘
          └────────────────────┼────────────────────┘
                               │   Raft / mosaik group
                               ▼
                      ┌───────────────────┐
                      │ zipnet-aggregator │   (non-TDX host, well-connected)
                      └─────────┬─────────┘
                                │
                                ▼
                         external publishers
                        (TDX where gated, else
                         operator-trusted hosts)

Each box runs ZIPNET_INSTANCE=acme.mainnet and joins zipnet::UNIVERSE over iroh; mosaik discovery wires the rest.

Running many instances side by side

Operators routinely run several instances — production, a public testnet, internal dev — on the same universe. Each has its own instance name, its own committee, its own MR_TD pin, its own ACL. Hosts can host one or many; run a separate unit per instance:

systemctl start zipnet-server@acme-mainnet
systemctl start zipnet-server@preview.alpha
systemctl start zipnet-server@dev.ops

Each unit sets a different ZIPNET_INSTANCE; they share the universe and the discovery layer, and appear to publishers as three distinct Zipnet::bind targets.

See also

Quickstart — stand up an instance

audience: operators

This page walks you from a fresh checkout to a live zipnet instance that external publishers can reach with one line of code. Read Deployment overview first for the architectural background; this page assumes it.

Who runs a zipnet instance

Typical deployments:

  • A rollup or app offering an encrypted mempool. The team runs the committee; user wallets publish sealed transactions; the sequencer or builder reads them ordered and opaque-to-sender, and decrypts at block-build time via whatever mechanism they prefer (threshold decryption, TEE unsealing).
  • An MEV auction team hosting a permissioned order-flow channel. The team runs the committee; whitelisted searchers publish intents; every connected builder reads the same ordered log.
  • A governance coalition running anonymous signalling. The coalition runs the committee; delegated wallets signal anonymously; anyone can tally.

What’s common: you want a bounded participant set — which you authenticate via TEE attestation and a ticket class — to publish messages without any single party (yourself included) being able to link message to sender. You run the committee and the aggregator. Participants bring their own TEE-attested client software, typically from a TDX image you also publish.

One-paragraph mental model

Zipnet runs as one service among many on a shared mosaik universe — a single NetworkId that hosts zipnet alongside other mosaik services (signers, storage, oracles). Your job as an operator is to stand up an instance of zipnet under a name you pick (e.g. acme.mainnet) and keep it running. External agents bind to your instance with Zipnet::bind(&network, "acme.mainnet") — they compile the name in from their side, so there is no registry to publish to and nothing to advertise. Your servers simply need to be reachable.

What you’re running

A minimum instance is:

RoleCountHosted where
Committee server3 or more (odd)TDX-enabled hosts you operate
Aggregator1 (v1)Any host with outbound UDP
(optional) Your own publishing clientsanyTDX-enabled if the instance is gated

All of these join the same shared mosaik universe. The committee and aggregator advertise on the shared peer catalog; external publishers reach them through mosaik’s discovery without any further config from you.

What defines your instance

Your instance is fully identified by three pieces of configuration:

#FieldNotes
1instance nameShort, stable, namespaced string (e.g. acme.mainnet). Folds into the committee GroupId, submit StreamId, and broadcasts StoreId.
2universe NetworkIdAlmost always zipnet::UNIVERSE. Override only if you run an isolated federation.
3ticket classWhat publishers must present: TDX MR_TD, JWT issuer, or both. Also folds into GroupId.

Round parameters (num_slots, slot_bytes, round_period, round_deadline) are configured per-instance via env vars and published at runtime in the LiveRoundCell collection that publishers read. They are immutable for the instance’s lifetime — bumping any of them requires a new instance name.

Items 1 and 3 fold into the instance’s derived IDs. Change either and the instance’s identity changes, meaning publishers compiled against the old values can no longer bond. See Designing coexisting systems on mosaik for the derivation.

Minimal smoke test

Before you touch hardware, confirm the pipeline works end-to-end on your laptop. The deterministic check is the integration test that exercises three committee servers + one aggregator + two clients over real mosaik transports in one tokio runtime:

cargo test -p zipnet-node --test e2e one_round_end_to_end

A green run in roughly 10 seconds tells you the crypto, consensus, round lifecycle, and mosaik transport are all healthy in your checkout. If it fails, nothing else on this page is going to work — investigate before touching hardware.

Exercising the binaries directly (optional)

If you want to watch the three role binaries run as separate processes — useful for shaking out systemd units, env vars, or firewall rules — bootstrap them by hand on one host. Localhost discovery over fresh iroh relays is slow, so give the first round up to a minute to land.

# terminal 1 — seed committee server; grab its peer= line from stdout
ZIPNET_INSTANCE="dev.local" \
ZIPNET_COMMITTEE_SECRET="dev-committee-secret" \
ZIPNET_SECRET="seed-1" \
./target/debug/zipnet-server

# terminals 2+3 — remaining committee servers, bootstrapped off #1
ZIPNET_INSTANCE="dev.local" \
ZIPNET_COMMITTEE_SECRET="dev-committee-secret" \
ZIPNET_SECRET="seed-2" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
./target/debug/zipnet-server

# terminal 4 — aggregator
ZIPNET_INSTANCE="dev.local" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
./target/debug/zipnet-aggregator

# terminal 5 — reference publisher
ZIPNET_INSTANCE="dev.local" \
ZIPNET_BOOTSTRAP=<peer_id_from_terminal_1> \
ZIPNET_MESSAGE="hello from the smoke test" \
./target/debug/zipnet-client

A healthy run prints round finalized on the committee servers within a minute and the client’s payload echoes back on the subscriber side. TDX is off in this mode — production instances re-enable it (see below).

What every server process does for you

When zipnet-server starts it:

  1. Joins the shared universe network (zipnet::UNIVERSE, or whatever you set ZIPNET_UNIVERSE to).
  2. Derives every instance-local id from ZIPNET_INSTANCE — committee GroupId, the submit stream, the broadcasts collection, the registries.
  3. Bonds with its peers using the committee secret and TDX measurement.
  4. Advertises itself on the shared peer catalog via mosaik’s standard /mosaik/announce gossip. Publishers that compile in the same instance name reach the same GroupId and bond automatically.
  5. Accepts rounds from the aggregator and replicates broadcasts through the committee Raft group.

You do not configure streams, collections, or group ids by hand, and you do not publish an announcement anywhere. The instance name is the only piece of identity you manage; everything else is either derived or taken care of by mosaik.

Building a TDX image (production path)

For production, every committee server and every publishing client runs inside a TDX guest. Mosaik ships the image builder — you do not compose QEMU, OVMF, kernels, and initramfs yourself, and you do not compute MR_TD by hand.

In the committee server crate’s build.rs:

// crates/zipnet-server/build.rs
fn main() {
    mosaik::tee::tdx::build::ubuntu()
        .with_default_memory_size("4G")
        .build();
}

Add to Cargo.toml:

[dependencies]
mosaik = { version = "0.3", features = ["tdx"] }

[build-dependencies]
mosaik = { version = "0.3", features = ["tdx-builder-ubuntu"] }

After cargo build --release you get, in target/release/tdx-artifacts/zipnet-server/ubuntu/:

ArtifactWhat it’s for
zipnet-server-run-qemu.shSelf-extracting launcher. This is what you invoke on a TDX host.
zipnet-server-mrtd.hexThe 48-byte measurement. Publishers pin against this.
zipnet-server-vmlinuzRaw kernel, in case you repackage.
zipnet-server-initramfs.cpio.gzRaw initramfs.
zipnet-server-ovmf.fdRaw OVMF firmware.

Mosaik computes MR_TD at build time by parsing the OVMF, the kernel and the initramfs according to the TDX spec — the same value the TDX hardware will report at runtime. You ship this hex string alongside your announcement; a client whose own image does not measure to the same MR_TD cannot join the instance. See users/handshake-with-operator for the matching client-side flow.

Alpine variant (mosaik::tee::tdx::build::alpine(), feature tdx-builder-alpine) produces a ~5 MB image versus Ubuntu’s ~25 MB, at the cost of musl. Use Alpine for publishers where image size matters; keep Ubuntu for committee servers unless you have a specific reason otherwise.

Instance naming and your users’ handshake

Publishers bond to your instance by knowing three things: the universe NetworkId, the instance name, and (if TDX-gated) the MR_TD of your committee image. That is the complete handoff — no registry, no dynamic lookup, no on-network advertisement.

Publish these via whatever channel suits your users: release notes, a docs page, direct handoff in a setup email. Users bake the instance name (or its derived UniqueId) into their code at compile time.

Instance names share a flat namespace per universe. Two operators picking the same name collide in the committee group and neither works correctly — mosaik has no mechanism to prevent this and no way to tell you it happened. Namespace aggressively: <org>.<purpose>.<env>, for example acme.mixer.mainnet. If in doubt, include an irrevocable random suffix once and forget about it (acme.mixer.mainnet.8f3c1a).

Retiring an instance is just stopping every server under that name. Publishers still trying to bond will see ConnectTimeout; they update their code to the new name and rebuild.

Going live

Once the smoke test passes on staging hardware:

  1. Build your production TDX images (committee + client). Publish the two mrtd.hex values to whatever channel your users consume (docs site, release notes, signed announcement).
  2. Stand up three TDX committee servers on geographically separate hosts, with the production ZIPNET_INSTANCE and ZIPNET_COMMITTEE_SECRET.
  3. Stand up the aggregator on a non-TDX but well-connected host.
  4. Verify the committee has elected a leader and the aggregator is bonded to the submit stream. Your own aggregator metrics are the easiest check; on the committee side, exactly one server should report mosaik_groups_leader_is_local = 1.
  5. Hand publishers your instance name, one universe bootstrap PeerId, and (if TDX-gated) your committee MR_TD. That is the entirety of their onboarding.

Running many instances side by side

Operators routinely run several instances — production, a public testnet, internal dev — on the same universe. Each has its own instance name, its own committee, its own MR_TD pin, its own ACL. Hosts can host one or many; the binary multiplexes them:

systemctl start zipnet-server@acme-mainnet
systemctl start zipnet-server@preview.alpha
systemctl start zipnet-server@dev.ops

Each unit sets a different ZIPNET_INSTANCE; they share the universe and the discovery layer, and appear to publishers as three distinct Zipnet::bind targets.

Next reading

audience: operators

End-to-end deploy example — one TDX host

A worked, copy-pasteable runbook that stands up a complete zipnet instance on a single TDX-capable host reachable at ubuntu@tdx-host. The topology is the minimum viable deployment: three committee servers, one aggregator, one reference publisher, all co-located as separate TDX guests (plus one non-TDX process for the aggregator) on the same physical host.

Use this recipe for staging, integration, or a demo. For production, split the three committee servers onto three independently-operated TDX hosts — the steps per host are identical; only the bootstrap wiring changes.

What you are about to build

                    ubuntu@tdx-host  (one physical TDX server)
  ┌──────────────────────────────────────────────────────────────┐
  │  TDX guest #1        TDX guest #2        TDX guest #3        │
  │  zipnet-server-1     zipnet-server-2     zipnet-server-3     │
  │        │                    │                    │           │
  │        └────── Raft / mosaik group (committee) ──┘           │
  │                             │                                │
  │               ┌─────────────▼──────────────┐                 │
  │               │ zipnet-aggregator (no TDX) │                 │
  │               └─────────────┬──────────────┘                 │
  │                             │                                │
  │                ┌────────────▼────────────┐                   │
  │                │  TDX guest #4           │                   │
  │                │  zipnet-client (demo)   │                   │
  │                └─────────────────────────┘                   │
  └──────────────────────────────────────────────────────────────┘

The instance name used throughout is demo.tdx. Swap it for your own namespaced name before running anything in production (<org>.<purpose>.<env>; see Quickstart — naming the instance).

Prerequisites

On your workstation:

  • A checkout of this repo.
  • Rust 1.93 (rustup show confirms rust-toolchain.toml).
  • SSH access to the host: ssh ubuntu@tdx-host returns a shell.
  • scp and rsync available locally.

On ubuntu@tdx-host:

  • Bare-metal or cloud host with Intel TDX enabled in BIOS and a TDX kernel installed. ls /dev/tdx_guest exists on the host and the kernel module kvm_intel is loaded with tdx=Y. If you are unsure, run dmesg | grep -i tdx.
  • qemu-system-x86_64 at a version the mosaik launcher supports (8.2+). The launcher script will tell you if the local QEMU is too old.
  • A user that can access /dev/kvm and /dev/tdx_guest without root. On Ubuntu, add ubuntu to the kvm and tdx groups.
  • tmux (used below to keep each role’s logs visible). Any process supervisor works — systemd user units, screen, nohup. The commands that follow use tmux because it is the lowest-ceremony option.
  • Outbound UDP to the internet for iroh / QUIC and mosaik relays. No inbound ports need to be opened — mosaik’s hole-punching layer handles reachability.

Two small decisions fixed for this example:

KnobValue used hereWhy
ZIPNET_INSTANCEdemo.tdxShort, obvious, collision-unlikely. Rename freely.
ZIPNET_COMMITTEE_SECRETopenssl rand -hex 32 once, pasted into the env for all three serversShared admission secret for the committee. Clients and the aggregator must not see this value.
ZIPNET_MIN_PARTICIPANTS1So the single demo client triggers rounds. Raise to >=2 for real anonymity.
ZIPNET_ROUND_PERIOD3sEnough headroom on a shared host to see logs land in order.

Step 1 — Build the TDX artifacts on your workstation

From the repo root, build everything release-mode. The build.rs scripts in zipnet-server and zipnet-client invoke the mosaik TDX builder and drop launchable artifacts under target/release/tdx-artifacts/.

cargo build --release

When this finishes you have:

target/release/
  zipnet-aggregator                                 # plain binary; runs on any host
  tdx-artifacts/
    zipnet-server/ubuntu/
      zipnet-server-run-qemu.sh                     # self-extracting launcher
      zipnet-server-mrtd.hex                        # 48-byte committee measurement
      zipnet-server-vmlinuz
      zipnet-server-initramfs.cpio.gz
      zipnet-server-ovmf.fd
    zipnet-client/alpine/
      zipnet-client-run-qemu.sh
      zipnet-client-mrtd.hex                        # 48-byte client measurement
      zipnet-client-vmlinuz
      zipnet-client-initramfs.cpio.gz
      zipnet-client-ovmf.fd

Record both mrtd.hex values — these are the MR_TDs you will publish to readers alongside the instance name.

SERVER_MRTD=$(cat target/release/tdx-artifacts/zipnet-server/ubuntu/zipnet-server-mrtd.hex)
CLIENT_MRTD=$(cat target/release/tdx-artifacts/zipnet-client/alpine/zipnet-client-mrtd.hex)
echo "committee MR_TD: $SERVER_MRTD"
echo "client    MR_TD: $CLIENT_MRTD"

Step 2 — Copy artifacts to the host

ssh ubuntu@tdx-host 'mkdir -p ~/zipnet/{server,client,aggregator,logs}'

rsync -avz --delete \
  target/release/tdx-artifacts/zipnet-server/ubuntu/ \
  ubuntu@tdx-host:~/zipnet/server/

rsync -avz --delete \
  target/release/tdx-artifacts/zipnet-client/alpine/ \
  ubuntu@tdx-host:~/zipnet/client/

scp target/release/zipnet-aggregator \
  ubuntu@tdx-host:~/zipnet/aggregator/

The launcher scripts are self-extracting — they embed kernel, initramfs, and OVMF. You do not need to copy the raw vmlinuz / initramfs / ovmf.fd files unless you plan to repackage.

Step 3 — Pick a committee secret

On the TDX host, once, generate the shared committee secret and park it in a file you will source into each server’s environment. Anyone with this value can join the committee, so treat it as a root credential.

ssh ubuntu@tdx-host
# on the host
umask 077
openssl rand -hex 32 > ~/zipnet/committee-secret
chmod 600 ~/zipnet/committee-secret

Step 4 — Start the first committee server and capture its PeerId

The first server has no one to bootstrap against, so it starts without ZIPNET_BOOTSTRAP. Its startup line prints peer=<hex>… — capture that and reuse it as the bootstrap hint for every following process.

Open a tmux session on the host and start server 1:

# on the host
tmux new-session -d -s zipnet-s1 -n server-1
tmux send-keys -t zipnet-s1:server-1 "
  ZIPNET_INSTANCE=demo.tdx \
  ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
  ZIPNET_SECRET=server-1-seed \
  ZIPNET_MIN_PARTICIPANTS=1 \
  ZIPNET_ROUND_PERIOD=3s \
  ZIPNET_ROUND_DEADLINE=15s \
  RUST_LOG=info,zipnet_node=info \
  ~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-1.log
" C-m

Wait five or ten seconds for the TDX guest to come up, then pull the PeerId out of the log:

# on the host
BOOTSTRAP=$(grep -oE 'peer=[0-9a-f]{10,}' ~/zipnet/logs/server-1.log | head -1 | cut -d= -f2)
echo "bootstrap peer: $BOOTSTRAP"

If $BOOTSTRAP is empty, the guest has not finished booting — the first round of QEMU + TDX can take 30 s on a cold host. Re-run the grep after a beat.

What if I don’t see the peer= line? The self-extracting launcher prints its own boot banner first. The zipnet line (zipnet up: network=<universe> instance=demo.tdx peer=...) only appears once the binary inside the guest has announced. If it is still missing after a minute, less ~/zipnet/logs/server-1.log and look for QEMU-level errors — typically TDX not enabled, or /dev/kvm permissions.

Step 5 — Start the remaining two committee servers

Each server gets a distinct ZIPNET_SECRET (so each derives a unique PeerId) and bootstraps against server 1.

# on the host — still inside your SSH session
tmux new-session -d -s zipnet-s2 -n server-2
tmux send-keys -t zipnet-s2:server-2 "
  ZIPNET_INSTANCE=demo.tdx \
  ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
  ZIPNET_SECRET=server-2-seed \
  ZIPNET_BOOTSTRAP=$BOOTSTRAP \
  ZIPNET_MIN_PARTICIPANTS=1 \
  ZIPNET_ROUND_PERIOD=3s \
  ZIPNET_ROUND_DEADLINE=15s \
  RUST_LOG=info,zipnet_node=info \
  ~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-2.log
" C-m

tmux new-session -d -s zipnet-s3 -n server-3
tmux send-keys -t zipnet-s3:server-3 "
  ZIPNET_INSTANCE=demo.tdx \
  ZIPNET_COMMITTEE_SECRET=\$(cat ~/zipnet/committee-secret) \
  ZIPNET_SECRET=server-3-seed \
  ZIPNET_BOOTSTRAP=$BOOTSTRAP \
  ZIPNET_MIN_PARTICIPANTS=1 \
  ZIPNET_ROUND_PERIOD=3s \
  ZIPNET_ROUND_DEADLINE=15s \
  RUST_LOG=info,zipnet_node=info \
  ~/zipnet/server/zipnet-server-run-qemu.sh 2>&1 | tee ~/zipnet/logs/server-3.log
" C-m

Within 15–30 s, one of the three servers should log committee: opening round at index I_1. That one is the current Raft leader; the other two are followers. Which server wins the election is not deterministic — do not special-case the first server as “always the leader”.

Confirm the committee is healthy:

# on the host
grep -E 'zipnet up|leader|round' ~/zipnet/logs/server-*.log | tail -20

Step 6 — Start the aggregator

The aggregator is the only non-TDX process. It bootstraps against any committee server and must not be given the committee secret.

# on the host
tmux new-session -d -s zipnet-agg -n aggregator
tmux send-keys -t zipnet-agg:aggregator "
  ZIPNET_INSTANCE=demo.tdx \
  ZIPNET_SECRET=aggregator-seed \
  ZIPNET_BOOTSTRAP=$BOOTSTRAP \
  ZIPNET_FOLD_DEADLINE=2s \
  RUST_LOG=info,zipnet_node=info \
  ~/zipnet/aggregator/zipnet-aggregator 2>&1 | tee ~/zipnet/logs/aggregator.log
" C-m

A healthy aggregator settles quickly and logs aggregator booting; waiting for collections to come online within a few seconds.

Step 7 — Start the reference client

# on the host
tmux new-session -d -s zipnet-c1 -n client-1
tmux send-keys -t zipnet-c1:client-1 "
  ZIPNET_INSTANCE=demo.tdx \
  ZIPNET_BOOTSTRAP=$BOOTSTRAP \
  ZIPNET_MESSAGE='hello from ubuntu@tdx-host' \
  ZIPNET_CADENCE=1 \
  RUST_LOG=info,zipnet_node=info \
  ~/zipnet/client/zipnet-client-run-qemu.sh 2>&1 | tee ~/zipnet/logs/client-1.log
" C-m

Within one ZIPNET_ROUND_PERIOD (3s here) after the aggregator bonds, the Raft leader should print:

INFO zipnet_node::committee: committee: opening round at index I_1
INFO zipnet_node::roles::server: submitted partial unblind at I_2
INFO zipnet_node::committee: committee: round finalized round=r1 participants=1

Step 8 — Verify end-to-end

From the host, tail all four log streams at once:

# on the host
tail -F ~/zipnet/logs/server-*.log ~/zipnet/logs/aggregator.log ~/zipnet/logs/client-1.log

You are looking for:

SignalWhereMeaning
zipnet up: network=<universe> instance=demo.tdxevery roleUniverse join and instance binding succeeded.
mosaik_groups_leader_is_local = 1 on exactly one server (Prometheus or log line)server logsCommittee has a single Raft leader.
aggregator: forwarded aggregate to committee round=rN participants=1aggregatorClient envelopes reached the aggregator and were folded.
committee: round finalized round=rN participants=1whichever server is leaderEnd-to-end round closed; broadcast published into the Broadcasts collection.

Once you see round finalized with a non-zero participants count, the topology is working.

Cleanup

# on the host
for s in zipnet-s1 zipnet-s2 zipnet-s3 zipnet-agg zipnet-c1; do
  tmux kill-session -t $s 2>/dev/null || true
done

Each TDX guest emits a departure announcement over gossip on SIGTERM and Raft tolerates a majority remaining; kill-session sends SIGTERM to the foreground QEMU process, which in turn signals the guest.

If a guest is wedged, pkill -f zipnet-server-run-qemu.sh is safe — all in-memory state is disposable in v1.

What to change for a real deployment

This example collapses a three-node committee onto one host to keep the runbook short. To roll the same shape into production:

  1. Replace ubuntu@tdx-host with three separate TDX hosts ubuntu@tdx-1, ubuntu@tdx-2, ubuntu@tdx-3 run by three independent operators (or at minimum, with three independent blast radii). Geographic separation is the point.
  2. Run the aggregator on a fourth, non-TDX but well-connected host. Clients will often use it as a bootstrap; pick something with a stable address.
  3. Swap tmux for systemd unit files — one per role — so crash recovery is automatic. See Running a committee server for the full production env matrix.
  4. Bump ZIPNET_MIN_PARTICIPANTS to at least 2. A single client produces no anonymity.
  5. Publish the instance name, universe NetworkId, and the two MR_TDs ($SERVER_MRTD, $CLIENT_MRTD) to your users through release notes or a signed announcement. That is the entire onboarding handoff; see What you need from the operator for the matching reader side.

See also

Running a committee server

audience: operators

A committee server joins the Raft group that orchestrates the instance’s rounds, holds one of the X25519 keys used to unblind the broadcast vector, and publishes its public bundle into the replicated ServerRegistry. In production it runs inside a TDX guest built from the mosaik image builder; see the Quickstart TDX section.

One-shot command

ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_COMMITTEE_SECRET="your-committee-secret" \
ZIPNET_SECRET="stable-node-seed" \
ZIPNET_MIN_PARTICIPANTS=2 \
ZIPNET_ROUND_PERIOD=3s \
ZIPNET_ROUND_DEADLINE=15s \
./zipnet-server --bootstrap <peer_id_of_another_server>

On a fresh universe with no existing seed peers, start the first server without --bootstrap, grab the peer=… value printed at startup, and pass it as --bootstrap to the remaining servers. Every subsequent server, aggregator, or client can be bootstrapped off any one of them. After the universe has settled, the mosaik discovery layer finds peers on its own and the bootstrap hint is only needed for cold starts.

Environment variables

The full list lives in Environment variables. The ones you will actually set in production:

VariableMeaningNotes
ZIPNET_INSTANCEInstance name this server servesRequired. Short, stable, namespaced (e.g. acme.mainnet). Must match across the whole deployment.
ZIPNET_UNIVERSEUniverse overrideOptional. Leave unset to use zipnet::UNIVERSE (the shared mosaik universe). Set only for isolated federations.
ZIPNET_COMMITTEE_SECRETShared committee admission secretTreat as root credential. Identical on every committee member of this instance.
ZIPNET_SECRET (or --secret)Seed for this node’s stable PeerIdUnique per node. Anything not 64-hex is blake3-hashed.
ZIPNET_BOOTSTRAPPeer IDs to dial on startupHelpful on cold universes; unnecessary once discovery has converged.
ZIPNET_MIN_PARTICIPANTSMinimum clients before the leader opens a roundDefault 1. Set to at least 2 for meaningful anonymity.
ZIPNET_ROUND_PERIODHow often the leader attempts to open a rounde.g. 2s, 500ms.
ZIPNET_ROUND_DEADLINEMax time a round may stay opene.g. 15s. The leader will force-advance a stuck round.
ZIPNET_METRICSBind address for the Prometheus exportere.g. 0.0.0.0:9100.
RUST_LOGLog filterSane default: info,zipnet_node=info,mosaik=warn.

Naming the instance

Instance names share a flat namespace per universe. Two operators picking the same name collide in the same committee group and neither deployment works — mosaik has no way to prevent or detect this. Namespace aggressively: <org>.<purpose>.<env>, for example acme.mixer.mainnet. If unsure, add a random suffix once and forget about it (acme.mixer.mainnet.8f3c1a).

What a healthy startup looks like

INFO zipnet_server: spawning zipnet server server=a2095bed48
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=f5e28a69e6... role=3b37e5d575...
INFO zipnet_node::roles::server: server booting; waiting for collections + group
INFO zipnet_node::committee: committee: opening round at index I_1
INFO zipnet_node::roles::server: submitted partial unblind at I_2
INFO zipnet_node::committee: committee: round finalized round=r1 participants=N

A server that has been up for more than a minute and has not printed round finalized yet is almost always waiting on one of:

  1. Client count below ZIPNET_MIN_PARTICIPANTS. Check the aggregator’s zipnet_client_registry_size metric.
  2. Committee group has not elected a leader. Check mosaik_groups_leader_is_local on each server; exactly one should be 1.
  3. Bundle tickets not replicated. See Incident response — stuck rounds.

Resource profile

A single-slot round at the default RoundParams (64 slots × 256 bytes = 16 KiB broadcast vector) with 100 clients uses roughly:

  • CPU: a burst of ~5 ms per round per client (pad derivation dominates).
  • RAM: O(N) client bundles × 64 bytes + a ring buffer of recent aggregates.
  • Network: inbound one aggregate envelope per round (+ Raft heartbeat traffic between servers), outbound one partial per round + Raft replication to followers.

Graceful shutdown

Send SIGTERM. The server emits a departure announcement over gossip so peers learn within the next announce cycle (default 15 s) that it is gone. Raft proceeds with the remaining quorum provided a majority is still up.

Availability warning

In v1, any committee server going offline halts round progression because the state machine waits for one partial per server listed in the round’s roster. This is by design — the paper’s any-trust model prioritizes correctness over liveness. A v2 improvement is sketched in Roadmap to v2.

See also

Running the aggregator

audience: operators

The aggregator receives every client envelope for the live round, XORs them into a single AggregateEnvelope, and forwards that to the committee. It is untrusted for anonymity — compromising it only affects liveness and round-membership accounting, never whether a message can be linked to its sender. It is trusted for liveness: if it stops, rounds stop.

In v1 there is exactly one aggregator per instance. It does not need to run inside a TDX guest (though you can if your ops story prefers uniformity).

One-shot command

ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_SECRET="stable-agg-seed" \
ZIPNET_FOLD_DEADLINE=2s \
./zipnet-aggregator --bootstrap <peer_id_of_a_committee_server>

Environment variables

VariableMeaningNotes
ZIPNET_INSTANCEInstance name this aggregator servesRequired. Must match the committee’s. Typos show up as ConnectTimeout at round-open time.
ZIPNET_UNIVERSEUniverse overrideOptional; leave unset to use the shared universe.
ZIPNET_SECRET (or --secret)Seed for this aggregator’s stable PeerIdStrongly recommended: clients often use the aggregator as a discovery bootstrap.
ZIPNET_BOOTSTRAPPeer IDs to dial on startupAt least one committee server on a cold universe.
ZIPNET_FOLD_DEADLINETime window to collect envelopes after a round opensDefault 2s. Raising it admits slower clients at the cost of latency.
ZIPNET_METRICSPrometheus bind addressOptional.

The aggregator does not take ZIPNET_COMMITTEE_SECRET. It is outside the committee’s trust boundary by design; do not give it that secret even if your secret store makes it convenient.

What a healthy aggregator log looks like

INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=4c210e8340... role=5ef6c4ada2...
INFO zipnet_node::roles::aggregator: aggregator booting; waiting for collections to come online
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r1 participants=3
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r2 participants=3
...

Capacity planning

Per round the aggregator:

  • Receives N × B bytes from clients, where N is the number of active clients and B is the broadcast vector size (defaults to 16 KiB).
  • Sends one aggregate of size B to every committee server.

If the committee is 5 servers and the instance has 1000 clients with default parameters:

  • Inbound per round ≈ 1000 × 16 KiB = 16 MiB.
  • Outbound per round ≈ 5 × 16 KiB = 80 KiB.

At a 2 s round cadence, inbound averages 64 Mbit/s. Provision accordingly.

Graceful shutdown

SIGTERM. Clients whose envelopes had not yet been folded into the current round’s aggregate will drop to the floor; they retry on the next round automatically.

Because the aggregator is a single point of failure for liveness in v1, plan restarts against your monitoring: a round stall of 3 × ROUND_PERIOD + ROUND_DEADLINE triggers the stuck-round alert documented in Monitoring.

What if I want two aggregators?

Not supported in v1. Running two on the same instance name gets you two processes competing for the submit stream, not load-balancing. If you need redundancy today, fail over with a warm-standby host behind a process supervisor — not two live aggregators. A multi-tier aggregator tree is sketched in Roadmap to v2 — Multi-tier aggregators.

See also

Running a client

audience: operators

The typical zipnet publisher is an external user running their own TDX-attested agent — you don’t operate those. This page is about the reference zipnet-client binary you ship to publishers (or run yourself for a bundled wallet, a cover-traffic filler, or a smoke-test participant).

A client generates an X25519 keypair, publishes its public bundle via gossip, and seals one envelope per round. In production every client runs inside a TDX guest whose MR_TD matches the value your committee pinned; see Quickstart TDX section.

One-shot command

ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_MESSAGE="payload-to-broadcast" \
./zipnet-client --bootstrap <peer_id_of_aggregator_or_server>

Omit ZIPNET_MESSAGE to run a cover-traffic client that participates in every round with a zero payload. Cover traffic is the operator’s tool for raising the effective anonymity set size when real publishers are sparse.

Environment variables

VariableMeaningNotes
ZIPNET_INSTANCEInstance name to bind toRequired. Same string the committee uses; typos show up as ConnectTimeout.
ZIPNET_UNIVERSEUniverse overrideOptional; leave unset to use the shared universe.
ZIPNET_BOOTSTRAPPeer IDs to dial on startupAggregator’s PeerId or any committee server’s. Needed only on cold networks.
ZIPNET_MESSAGEUTF-8 message to seal per roundTruncate yourself to fit slot_bytes − tag_len. Default slot width is 240 bytes of user payload.
ZIPNET_CADENCETalk every Nth roundDefault 1. Useful for dialing your own talk/cover ratio.
ZIPNET_METRICSPrometheus bind addressOptional.

Building the TDX image you ship to publishers

Publishers to a TDX-gated instance need to run your client image (not their own ad-hoc build), because the committee will reject any client whose quote doesn’t match the pinned MR_TD. Build it the same way you build the server image — mosaik ships the builder:

// crates/zipnet-client/build.rs
fn main() {
    mosaik::tee::tdx::build::alpine()
        .with_default_memory_size("512M")
        .build();
}
# crates/zipnet-client/Cargo.toml
[dependencies]
mosaik = { version = "0.3", features = ["tdx"] }

[build-dependencies]
mosaik = { version = "0.3", features = ["tdx-builder-alpine"] }

Alpine is the usual choice for clients — ~5 MB versus Ubuntu’s ~25 MB — unless your agent has a specific glibc dependency. After cargo build --release the artifacts land under target/release/tdx-artifacts/zipnet-client/alpine/:

ArtifactWhat it’s for
zipnet-client-run-qemu.shSelf-extracting launcher publishers invoke on a TDX host.
zipnet-client-mrtd.hexThe 48-byte measurement. You pin this in the committee and publish it to readers.
zipnet-client-vmlinuzRaw kernel, for repackaging.
zipnet-client-initramfs.cpio.gzRaw initramfs.
zipnet-client-ovmf.fdRaw OVMF firmware.

Publish zipnet-client-mrtd.hex alongside your release notes. It goes into the committee’s Tdx::require_mrtd(...) configuration and into readers’ verification code. See Rotations and upgrades for rolling a new MR_TD without downtime.

What a healthy client log looks like

INFO zipnet_client: spawning zipnet client client=550fda1ffa
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=c2e9aeee0e... role=a8b7ed5911...
INFO zipnet_node::roles::client: client booting; waiting for rosters

After boot, every sealed envelope is a DEBUG event. Raise RUST_LOG to debug,zipnet_node=debug to see them.

Why a client’s envelope might get dropped

  • The client bundle hasn’t replicated yet. The first few rounds after a client connects may not include it in ClientRegistry. Wait for zipnet_client_registered to flip to 1 before relying on anonymity guarantees.
  • Slot collision with another client. v1’s slot assignment is a deterministic hash — two clients occasionally pick the same slot and XOR their messages into garbage. Neither falsification tag verifies, the committee still publishes the broadcast, the messages are lost, the clients retry next round. A 4x-oversized scheduling vector in v2 makes this rare.
  • Message is longer than slot_bytes − tag_len. The client exits with MessageTooLong. Shorten, or raise slot_bytes at the instance level (which retires the instance — see Rotations and upgrades).

Identity lifetime

In the mock path (TDX disabled), each process run generates a fresh X25519 identity — run-to-run unlinkability is free. In the TDX path, the identity lives in sealed storage inside the enclave so a restart preserves it; useful for reputation systems, but means the same enclave is recognizable across runs. Design accordingly when you pick a cover-traffic cadence.

See also

Rotations and upgrades

audience: operators

Every routine change in a running instance falls into one of these procedures. Follow them verbatim; the consensus and crypto are unforgiving about accidental divergence.

Rolling a committee server (restart, same identity)

Safe any time. Minority-restart is handled by Raft automatically.

  1. Stop the target server with SIGTERM. Wait for graceful exit (under 5 s).
  2. Replace the binary / restart the container / whatever triggered the rollout.
  3. Start the server with the same ZIPNET_INSTANCE, ZIPNET_SECRET, and ZIPNET_COMMITTEE_SECRET as before.
  4. Observe mosaik_groups_leader_is_local on the remaining servers — election should settle within a few seconds.
  5. Once the restarted server’s log shows round finalized, move to the next one.

Do not restart a majority of the committee simultaneously — that drops quorum and halts round progression until a majority is back up.

Adding a committee server

  1. Provision the new node. Assign it a fresh ZIPNET_SECRET seed.
  2. Distribute the same ZIPNET_INSTANCE and ZIPNET_COMMITTEE_SECRET to it.
  3. Start it with --bootstrap <peer_id_of_any_existing_server>.
  4. Wait for the new server’s log to print round finalized — it has caught up.
  5. Update your operational runbook, monitoring targets, and audit log to reflect the added node.

The ServerRegistry collection automatically reflects the new member within one round. Clients start including the new server in their pad derivation from the next OpenRound the leader issues.

Removing a committee server

  1. Announce the removal at least one gossip cycle ahead (default 15 s) so catalog entries expire cleanly.
  2. SIGTERM the target node.
  3. Verify the remaining servers still form a majority and continue to finalize rounds (round finalized events in the logs).

Security warning

A removed server retains its DH secret. If that secret is not wiped, an adversary who later compromises the decommissioned machine can replay historic rounds and compute that server’s share of past pads. Combined with any other committee server’s DH secret compromise, this would break anonymity of past rounds. Wipe DH secrets on decommission.

Rotating a committee server’s long-term key

v1 does not have first-class key rotation. The procedure is “decommission + re-add”:

  1. Remove the old server (above).
  2. Add a new server with a fresh ZIPNET_SECRET (above).

The committee’s GroupId does not change (it depends on the instance name and shared ZIPNET_COMMITTEE_SECRET, not on individual node identities), so the Raft group persists across the swap. The ServerRegistry entry is updated automatically.

Rotating the committee secret

This is disruptive: changing ZIPNET_COMMITTEE_SECRET changes the GroupId, so the old committee is abandoned. External publishers compiled against the instance name still bond, but the committee they find is new.

  1. Announce a maintenance window.
  2. Stop every client, aggregator, and committee server on this instance.
  3. Distribute the new ZIPNET_COMMITTEE_SECRET to all committee members.
  4. Start the committee first, then the aggregator, then the clients.

Rotating round parameters

RoundParams (num_slots, slot_bytes, tag_len) is folded into the committee’s state-machine signature. Changing it is equivalent to rotating the committee secret (above), and it is a breaking change for any publisher that compiled the old parameters in — meaning in practice you bump the instance.

See Retiring and replacing an instance below.

Dev note

Developers changing RoundParams in code must also bump the signature string in CommitteeMachine::signature() when appropriate — otherwise old and new nodes silently derive the same GroupId but disagree on apply semantics. See The committee state machine.

Rebuilding a TDX image

Rebuilding the committee or client image produces a new MR_TD. The committee’s ticket validator is pinned to a specific MR_TD, so a rebuild requires coordinated rollout:

  1. Build the new image with cargo build --release (the mosaik TDX builder runs in build.rs, producing a fresh mrtd.hex).
  2. Publish the new mrtd.hex to your release-notes channel.
  3. Decide whether the change is ABI-compatible with the current committee’s expectations:
    • Patch-level image change (kernel patch, initramfs tweak, no wire-format or state-machine change): accept both MR_TDs transiently by updating the committee’s require_mrtd list to include the new hash, roll the committee hosts one at a time to the new image, then drop the old MR_TD from the allow-list.
    • Breaking change (new state-machine signature, new wire format, new RoundParams): treat it as retiring the instance (below).
  4. Sign and publish the new MR_TD, along with the retirement window for the old one, so publishers can rebuild their own images in time.

Retiring and replacing an instance

Use this path whenever a cross-compatibility boundary moves (RoundParams, CommitteeMachine::signature, wire format, breaking MR_TD change). You have two idiomatic versioning stories:

  • Version in the name. Stand up the new deployment under a new instance name (acme.mainnet.v2). Old and new run in parallel for the transition window; publishers re-pin and rebuild at their own pace; you tear down the old instance when traffic has drained. The cleanest story for external publishers; forces them to cut a release.
  • Lockstep release against a shared deployment crate. Keep the instance name stable, cut a new deployment-crate version pinning the new state-machine signature, and coordinate operator + publisher upgrades as a single release event. Avoids instance-ID churn at the cost of tighter release-cadence coupling.

Zipnet v1 does not mandate which you pick; see Designing coexisting systems on mosaik — Versioning under stable instance names for the full tradeoff.

Retirement itself is just stopping every server under the old instance name. Publishers still trying to bond see ConnectTimeout; they rebuild against the new name or the new deployment crate and reconnect.

Upgrading the binary

Patch-level upgrades (no CommitteeMachine::signature change, no RoundParams change, no wire format change, no MR_TD change if TDX-gated) are safe to roll one node at a time following the restart procedure.

Upgrades that change any of those four cross a compatibility boundary — treat them like retiring the instance.

Dev notes on where to look in source:

  • WIRE_VERSION in crates/zipnet-proto/src/lib.rs
  • CommitteeMachine::signature in crates/zipnet-node/src/committee.rs
  • RoundParams::default_v1 in crates/zipnet-proto/src/params.rs

Any change to those requires a coordinated restart of the whole instance.

See also

Monitoring and alerts

audience: operators

Zipnet inherits mosaik’s Prometheus exporter. Enable it by setting ZIPNET_METRICS=0.0.0.0:9100 (or a port of your choice) on every node you want scraped. See Metrics reference for the complete list; this page covers the metrics that actually tell you whether an instance is healthy.

All zipnet-emitted metrics carry an instance="<name>" label set from ZIPNET_INSTANCE. Scope your alert rules on that label so a stuck preview.alpha doesn’t page the on-call for acme.mainnet.

The three questions you ask every shift

1. “Are rounds finalizing?”

The authoritative signal is new entries appearing in the Broadcasts collection. Track the rate of round finalized log events on committee servers (INFO level). A healthy instance finalizes one round per ZIPNET_ROUND_PERIOD interval, plus or minus ZIPNET_FOLD_DEADLINE.

Alert condition: no round finalized event on a leader server for 3 × ROUND_PERIOD + ROUND_DEADLINE.

2. “Is the committee healthy?”

  • Exactly one committee server in this instance should report itself as leader at any one time. If zero or two-plus, investigate (see Incident response — split-brain). The relevant metric is mosaik_groups_leader_is_local{instance="…"}.
  • Bond count per server should equal N − 1 where N is the committee size. A dropped bond suggests a universe-level partition or an expired ticket.
  • Raft log position should advance in lockstep across servers. A persistent lag (> 5 indices) on one server indicates that node is falling behind.

3. “Are clients and their pubkeys reaching the committee?”

  • ClientRegistry size ≈ number of clients you launched for this instance, give or take gossip cycles.
  • Per-round participants count in round finalized events ≈ the number of non-idle clients.

Alert condition: participants = 0 for two consecutive rounds while you expected > 0.

Useful log filters

On committee servers:

journalctl -u zipnet-server@acme-mainnet -f \
  --grep='round finalized|opening round|submitted partial|SubmitAggregate|rival group leader'

On the aggregator:

journalctl -u zipnet-aggregator@acme-mainnet -f \
  --grep='forwarded aggregate|registering client'

On clients:

journalctl -u zipnet-client@acme-mainnet -f \
  --grep='sealed envelope|registration'

(Adjust for your process supervisor.)

Baseline expectations at default parameters

ConditionCommittee serverAggregatorClient
Steady-state CPU< 5 % on a mid-range corevaries with client count< 1 %
RAM50–200 MB100–500 MB20–50 MB
Bond countcommittee_size − 10 (not a group member)0
Gossip catalog sizetotal universe node count ± 2total universe node count ± 2total universe node count ± 2
Inbound per roundN × B / committee_size (replication)N × BB / client
Outbound per roundB + heartbeatscommittee_size × BB

N = clients, B = broadcast vector bytes (default 16 KiB).

Dev note

The gossip catalog includes peers from every service on the shared universe, not just zipnet. Your catalog size may be much larger than your committee size if the universe also hosts multisig signers, oracles, or other mosaik agents. Do not alert on absolute catalog size; alert on change in catalog size relative to a baseline.

Sensible alerts to configure

  1. Round stall. No new Broadcasts entry for 3 × ROUND_PERIOD + ROUND_DEADLINE. Page on-call: committee is stuck, aggregator is down, or min_participants is unmet.
  2. Committee partition. sum by (instance) (mosaik_groups_leader_is_local{instance="…"}) is 0 or ≥ 2 for more than 1 minute. Page on-call.
  3. TDX attestation approaching expiry. Less than 24 h to ticket exp on any bonded peer. Page TEE operator.
  4. Bond drop. mosaik_groups_bonds{peer=<known>,instance="…"} drops from 1 to 0 for more than 30 s and does not recover.

Multi-instance dashboards

Since multiple instances share the same universe and the same host fleet, build dashboards with instance as a dimension from the start:

  • A top-level panel showing rate(zipnet_round_finalized_total[1m]) broken out by instance.
  • A committee-health grid: rows are instances, columns are the committee members, cells are mosaik_groups_leader_is_local.
  • A per-instance heatmap of participants over time — sparse rounds are often the first hint of a sick publisher fleet.

A starter Grafana dashboard is not shipped in v1. The metrics list in Metrics reference is sufficient to build one. A community-maintained dashboard is tracked as a v2 follow-up.

See also

Incident response

audience: operators

This page is a runbook. It lists the failure modes we have actually observed in testing and the minimal steps that resolve each. Each section is scoped to a single instance — if multiple instances on the same universe are misbehaving at once, something is wrong at the universe level (relays, DHT, network) rather than in any one instance, and you should start with the “Discovery is slow” section.

Stuck rounds

Symptom: no round finalized log on any committee server in this instance for more than 3 × ROUND_PERIOD + ROUND_DEADLINE.

Root-cause checklist, in order of likelihood:

  1. Fewer active clients than ZIPNET_MIN_PARTICIPANTS. The leader won’t open a round until this threshold is met.

    • Check: zipnet_client_registry_size{instance="…"} on any committee server.
    • Fix: either start more clients (or cover-traffic filler) or lower ZIPNET_MIN_PARTICIPANTS (rolling restart of the committee — this is in the state machine’s signature derivation, so everyone needs the same value).
  2. Committee has no leader. Raft election has not settled (yet, or ever).

    • Check: mosaik_groups_leader_is_local{instance="…"} == 0 on all members.
    • Fix: usually self-heals within ELECTION_TIMEOUT + BOOTSTRAP_DELAY. If persistent, suspect clock skew or a full network partition.
  3. Client bundles have not replicated to the committee. Clients have connected but their bundles haven’t landed in ClientRegistry — the aggregator hasn’t yet mirrored them in.

    • Check: aggregator log for registering client bundle; this should fire for each new client.
    • Fix: ensure the aggregator is reachable from every client (correct ZIPNET_BOOTSTRAP or working universe discovery). Wait one gossip cycle (≈ 15 s).
  4. One or more server bundles missing from ServerRegistry. A committee server failed to self-publish.

    • Check: query ServerRegistry size on each committee server; should equal committee size.
    • Fix: restart the offending server; it re-publishes on boot.

If a publisher reports Error::ConnectTimeout that traces back to any of the root causes above, it is an operator-side issue surfacing as a user-side error. The SDK cannot distinguish “my instance name is wrong” from “the operator’s committee is stuck” — that’s a deliberate tradeoff of the no-registry design.

Split-brain

Symptom: two or more committee servers in this instance report mosaik_groups_leader_is_local == 1, or a server’s log shows rival group leader detected.

v1 uses mosaik’s modified Raft which resolves rivals by term. The system self-heals within one ELECTION_TIMEOUT. If it does not self-heal:

  1. Check clock skew across committee members (ntpdate -q on each). More than a few seconds of skew breaks Raft timing.
  2. Check the network — split-brain persisting past self-heal is a partition.
  3. As a last resort, SIGTERM the minority faction. They’ll rejoin as followers.

Do not change ZIPNET_COMMITTEE_SECRET mid-incident. It would force a fresh committee group and hide evidence of the split, not resolve it.

Committee quorum loss

Symptom: fewer than a majority of committee servers are reachable. Rounds cannot commit.

  1. Restore the failed nodes. They rejoin on startup.
  2. If restoration is impossible (hardware loss, etc.), a v1 deployment has no graceful recovery — retire the instance and stand up a fresh one under a new name (or bump the deployment crate version). See Rotations and upgrades — Retiring and replacing an instance.

Aggregator crash-loop

Symptom: aggregator exits or OOMs shortly after boot.

Most common cause in v1: too many concurrent clients pushing envelopes larger than the internal buffer (buffer_size = 1024 per mosaik default).

Fix: either lower client concurrency by splitting the publisher fleet across multiple instances (each with its own ZIPNET_INSTANCE), or tune the aggregator’s stream buffer when calling network.streams().consumer::<ClientEnvelope>().with_buffer_size(N) — this requires a code change in zipnet-node (dev task).

TDX attestation expiry

Symptom: committee rejects a previously-good peer with unauthorized; the peer re-bonds in a loop with the same outcome. On the peer side, logs mention an expired quote.

Causes, in order of likelihood:

  1. Quote exp elapsed. Each TDX quote carries an expiration. The bonded peer needs a fresh quote.
    • Fix: restart the peer. On restart the TDX layer fetches a new quote from the hardware. If the peer still fails, check the TDX host’s attestation service reachability.
  2. Clock skew between the peer and the committee. The committee rejects a quote whose exp has already passed in its local clock.
    • Fix: NTP on both sides.
  3. MR_TD mismatch. The peer is running a different image than the committee expects. Common after a committee rebuild the peer hasn’t yet picked up.

Discovery is slow (universe-level)

Symptom: nodes log Could not bootstrap the routing table and take minutes to find each other. Typically affects all instances on the same universe simultaneously.

Usual cause: iroh’s pkarr / Mainline DHT bootstrap is struggling (common on fresh residential networks or a fresh universe). Workarounds:

  1. Pass an explicit ZIPNET_BOOTSTRAP=<peer_id> on every non-bootstrap node.
  2. Enable mDNS discovery (already on by default in this prototype). For LAN deployments this is often enough.
  3. Run a mosaik bootstrap node (see mosaik’s examples/bootstrap.rs) with a well-known public address and seed it everywhere.

A dedicated bootstrap node is recommended for any production universe that hosts more than one zipnet instance.

When to escalate

  • Unknown log messages containing committed or reverted outside the expected Raft lifecycle.
  • Broadcasts collection contains entries where the number of servers in the record does not match your configured committee size for this instance.
  • Any indication that two clients with the same ClientId coexist (would mean someone forged a bundle — investigate as a security incident).
  • Publishers reporting WrongUniverse — indicates an operator misconfiguration of ZIPNET_UNIVERSE, or a publisher using the wrong zipnet::UNIVERSE constant.

See also

Accounting and audit

audience: operators

Anonymous broadcast looks, from the outside, uncomfortably like a thing you cannot account for. Auditors will ask. This page tells you what you can attest to, what you cannot, and how to produce evidence for each. Everything here is scoped to a single zipnet instance — multiple instances on the same universe are separately audited against their own committee roster and Broadcasts collection.

What the protocol is designed to guarantee

  • Given at least one honest committee server, no party — not the operator, not the aggregator, not the remaining committee members, not an outside observer of the network — can determine which client authored which published broadcast.
  • Given all parties operating the protocol honestly, every broadcast in the Broadcasts log is the XOR-sum of the messages of the clients listed in that round’s participants field, subject to slot collisions.
  • Committed broadcasts are signed-in-transit by every bonded pair and logically signed by the Raft leader at commit time. Replays are detectable.

What the protocol is not designed to guarantee

  • Who an individual ClientId refers to. A client’s ClientId is a hash of its X25519 public key, not a legal identity. You will need an out-of-band registration process if you want to tie a ClientId to a legal entity.
  • That a broadcast is well-formed. A malicious client can put garbage in its slot. The falsification tag protects honest clients from other clients corrupting their slot, but not from a client corrupting its own slot.
  • Censorship-resistance. A malicious aggregator or a majority of malicious committee servers can delay or drop rounds. Anonymity still holds; availability does not.

What you can attest to

“Did this instance publish this broadcast on this date?”

Every entry in the instance’s Broadcasts collection carries:

  • round: RoundId
  • participants: Vec<ClientId> — snapshot of the active clients at round-open time
  • servers: Vec<ServerId> — committee members that contributed partials
  • broadcast: Vec<u8> — the final XORed vector

Together with the Raft commit index, this is a point-in-time claim signed (through the bond layer) by every committee server. Archive the Broadcasts entries you care about, keyed by instance name — there is no authoritative external registry.

“Who was running which node on this date?”

This is an organizational fact, not a cryptographic one. Maintain an external table per instance:

InstancePeerIdLegal entityRoleValid fromValid to
acme.mainnetf5e28a…Acme Corpcommittee-server-12026-03-01present
acme.mainnet4c210e…Acme Corpaggregator2026-03-01present
acme.previewa91742…Acme Corpcommittee-server-12026-04-02present

Sign this table with your corporate root, version it, and include it in your audit package. PeerId is stable when ZIPNET_SECRET is stable; rotate only via a documented procedure (see Rotations and upgrades).

“Was a specific server in the committee on this round?”

BroadcastRecord::servers lists every committee member whose partial unblind was folded into the published broadcast. Combine with your PeerId → legal entity table to produce a legal-readable statement.

“Did this committee server operate honestly?”

You cannot prove this from the record alone — a malicious committee member can behave indistinguishably from an honest one, provided at least one other committee member is honest. (That’s the whole point of the any-trust model.) What you can attest to:

  • The server was up and participating (its partial is folded in).
  • The server’s key material was controlled by the claimed legal entity (via the PeerEntry signature).
  • For TDX-gated instances, the server’s boot measurement matched the committee’s pinned MR_TD. Archive the quote alongside the instance deployment record (see below).

In regulatory settings where “operated honestly” must be proven positively, a TDX attestation is as close as the protocol gets — the quote cryptographically proves the code running inside the committee server matches a published image hash.

Archival recommendations

  1. Archive Broadcasts continuously, per instance. A committee server’s in-memory copy is the source of truth in v1; if the majority of the committee goes offline at once, the log is gone. Mirror the log into durable storage at your cadence of choice. A minimal script: open a Zipnet::bind(&network, instance) handle in read-only mode from a non-committee host, iterate entries newer than your checkpoint, append to a signed ledger, commit.

  2. Archive the PeerId table, keyed by instance. Version it; keep change history. A SHA-256 of this table goes into your audit manifest.

  3. Archive the instance configuration. For each instance:

    • Instance name.
    • ZIPNET_COMMITTEE_SECRET’s blake3 fingerprint (not the raw secret).
    • RoundParams.
    • ConsensusConfig.
    • Committee roster.
    • Committee MR_TD (if TDX-gated).
  4. Archive TDX attestation quotes. For TDX-gated instances, each committee server’s quote includes its MR_TD and RTMRs. Store them per instance, per deploy.

Evidence package for external audit

A minimal per-quarter package, per instance:

  • Instance name and its universe NetworkId.
  • Broadcasts log excerpt for the quarter (signed by your corporate root).
  • PeerId → legal entity table for that instance (signed, version-pinned).
  • Instance configuration fingerprint: SHA-256 of blake3(COMMITTEE_SECRET) || blake3(ROUND_PARAMS) || blake3(CONSENSUS_CONFIG) || instance_name.
  • Committee MR_TD (TDX-gated instances).
  • List of committee membership changes, cross-referenced to git/CD deployment records.
  • Incident log covering any stuck rounds, split-brain events, or membership changes in the period.

An auditor can re-derive ClientIds referenced in participants from the corresponding signed PeerEntry tickets archived from gossip — useful if they want to ask “was client X part of round Y”.

Multiple instances, shared universe

Because zipnet instances share a universe, an auditor who reads your raw gossip logs will see traffic that belongs to other instances — and possibly to other mosaik services entirely. Two consequences to call out in your audit narrative:

  • Gossip-level traffic volume from your fleet is not a proxy for your instance’s traffic. A committee server on acme.mainnet routinely forwards discovery messages on behalf of other instances and services on the same universe.
  • Peer-catalog size is likewise a universe-level quantity. Do not attempt to derive per-instance population from catalog counts.

For per-instance accounting, stick to the Broadcasts collection and the ServerRegistry / ClientRegistry contents read through Zipnet::bind(&network, instance).

Privacy and data retention

Published broadcasts are, by design, readable by anyone who can read the Broadcasts collection. Treat them as public data. Archival retention policy is a business decision; the protocol neither enforces nor contradicts any specific retention period.

Signed PeerEntrys (carrying peers’ ClientBundles / ServerBundles) are also public by design — they are gossiped to every universe member. There is no way to revoke a signed entry retroactively.

Security warning

Do not publish ZIPNET_COMMITTEE_SECRET or any committee server’s X25519 secret, historic or current. Disclosure of any committee server’s DH secret, combined with disclosure of any other committee server’s DH secret, breaks anonymity of every round in which both nodes participated.

See also

Security posture checklist

audience: operators

Each item below is a pre-production checklist entry. Print it, initial it, file it with the deploy record. Work through this checklist per instance — an honest posture on acme.mainnet does not protect preview.alpha if the two share a fault domain or a secret store.

Instance identity and scope

  • ZIPNET_INSTANCE is set to a namespaced string (e.g. acme.mainnet) and documented in the release notes your publishers consume. No operator within the same universe uses the same string.
  • ZIPNET_UNIVERSE, if set, points at a universe you control. The default (zipnet::UNIVERSE) is the shared world and is correct for most deployments.
  • The instance’s MR_TD (TDX-gated instances) is published alongside the instance name in a signed channel. Publishers verify against that hash.

Committee secret handling

  • ZIPNET_COMMITTEE_SECRET is stored only in a secret manager (vault, AWS Secrets Manager, HashiCorp Vault, k8s Secret resource). Never in a git repo, never in a plain environment file.
  • The secret is unique per instance. Do not reuse one committee secret across acme.mainnet and acme.preview even though the operator is the same.
  • Rotation procedure is documented and rehearsed (see Rotations and upgrades).
  • Access to read the secret is audited. A quarterly review of access logs is on the calendar.

Committee server node hygiene

  • Each committee server runs in a separate fault domain (different cloud account, different region, different operator organization if possible). The whole point of any-trust is diversity.
  • In production, every committee server runs inside a TDX guest built by the mosaik image builder. The committee’s require_mrtd(...) validator is set to the build’s measured MR_TD. See Rebuilding a TDX image for the rebuild cadence.
  • ZIPNET_SECRET is unique per node and stored in the node’s own secret scope (not shared with any other node).
  • Committee servers listen only on the iroh port (default UDP ephemeral + relay) and the Prometheus metrics port. No other inbound exposure.
  • Decommissioned committee servers have their disks wiped. DH secrets leaking from a decommissioned box are historically replayable.

Aggregator node hygiene

  • The aggregator is not in the committee’s secret-possession circle. It does not have access to ZIPNET_COMMITTEE_SECRET.
  • Aggregator memory is not a secret store — aggregates are XOR-sums whose plaintext only the committee can recover. Still, hardening the aggregator is good practice: read-only filesystem, dropped capabilities, etc.
  • If you operate one aggregator per instance, each is configured with its own ZIPNET_INSTANCE and its own ZIPNET_SECRET.

Client image hygiene (TDX-gated instances)

  • The client image you ship to publishers is built reproducibly. The mosaik TDX builder is deterministic — commit your toolchain and feature-flag set alongside the release.
  • The committee’s Tdx validator lists the published client MR_TD in require_mrtd(...). Publishers running any other image are rejected at bond time.
  • TDX quote expiration is monitored; see Monitoring.
  • Image rebuild cadence is documented. At minimum, rebuild whenever the upstream kernel or initramfs toolchain ships a security fix — a new MR_TD is cheap compared with unpatched firmware.

Client image hygiene (TDX disabled, dev/test only)

  • Understood: without TDX, the client trusts the client host for DH key protection. Anyone with access to the client process can deanonymize that client’s own messages (not others’).
  • Clients handling non-public messages wait for the ClientRegistry to include their own entry and wait for at least ZIPNET_MIN_PARTICIPANTS − 1 other clients to also be registered before relying on anonymity properties.
  • This posture is explicitly not used for production in TDX-gated instances.

Network hygiene

  • Firewalls permit outbound UDP to iroh relays. If you run your own relay, ensure clients can reach it.
  • NTP is configured on every node. Raft tolerates small skew; large skew causes election storms. TDX quote validation is also clock-sensitive.
  • Prometheus metrics endpoints are NOT publicly exposed.

Archival / audit

  • A job pulls the Broadcasts collection to durable storage at the chosen cadence, keyed by instance name (see Accounting and audit).
  • PeerId → legal entity registry is version-controlled, signed, and scoped per instance.

Emergency contacts

  • On-call rotation documented for each node, per instance.
  • Break-glass procedure for committee-secret rotation documented, per instance.
  • “Who can revoke a compromised bundle ticket” is specified — note that in v1 a ticket lives in gossip until the node is removed from the universe, so the answer is “the node’s operator, by stopping the node”.

Known-not-yet-protected footguns

  • Metadata from iroh. The iroh layer leaks some metadata (relay preferences, coarse geography via relay choice). A global passive adversary observing traffic patterns across relays can narrow anonymity sets.
  • Cross-instance traffic correlation. Instances share a universe. A passive observer of gossip can often tell “this peer is a member of instance X” from catalog membership, even without seeing any Broadcasts content. Anonymity within a round is unaffected; anonymity of membership in an instance is not a property the protocol provides.
  • Client message length. The protocol encrypts the message but does not pad it to a uniform length. Unusually long messages are recognizable in the broadcast. Pad your payloads to the nearest slot boundary at the application layer if this matters for you.
  • Participant set disclosure. BroadcastRecord::participants lists every ClientId whose envelope was folded into the round. Knowing “client X was in this round” is not the same as knowing “client X wrote this message”, but it is visible and it leaks connection timing.

These are tracked in Roadmap to v2.

See also

Designing coexisting systems on mosaik

audience: contributors

Mosaik composes primitivesStream, Group, Collection, TicketValidator. It does not prescribe how a whole service — a deployment with its own operator, its own ACL, its own lifecycle — is shipped onto a network and made available to third-party agents. That convention lives one layer above mosaik and has to be invented per service family.

This page describes the convention zipnet uses, why it was picked, and what a contributor building the next service on mosaik (multisig signer, secure storage, attested oracle, …) should reuse. It is a mental model, not an API reference: the concrete instantiation is in Architecture.

The problem

A mosaik network is a universe where any number of services run concurrently. Each service:

  • is operated by an identifiable organisation (or coalition) and has its own ACL
  • ships as a bundle of internally-coupled primitives — usually a committee Group, one or more collections backed by that group, and one or more streams feeding it
  • must be addressable and discoverable by external agents who do not operate it
  • co-exists with many other instances of itself (testnet, staging, per-tenant deployments) and with unrelated services on the same wire

The canonical shape zipnet itself was built for is an encrypted mempool — a bounded set of TEE-attested wallets publishing sealed transactions for an unbounded set of builders to read, ordered and unlinkable to sender. Other services built on this pattern (signers, storage, oracles) have the same structural properties.

Nothing about these requirements is in mosaik itself. The library will happily let you stand up ten Groups and thirty Streams on one Network; it says nothing about which of them constitute “one zipnet” versus “one multisig”.

Two axes of choice

Every design in this space picks a point on two axes.

  1. Network topology. Does a deployment live on its own NetworkId, or on a shared universe with peers of every other service?
  2. Discovery. How does an agent go from “I want zipnet-acme” to bonded-and-consuming without hardcoded bootstraps or out-of-band config?

Four shapes fall out:

ShapeTopologyWhen to pick
A. Service-per-networkOne NetworkId per deployment; agents multiplex many Network handlesStrong isolation, per-service attestation scope, no cross-service state
B. Shared meta-networkOne universe NetworkId; deployments are overlays of Groups/StreamsMany services per agent, cheap composition, narrow public surface required to tame noise
C. Derived sub-networksROOT.derive(service).derive(instance) hybridsIsolation with structured discovery, still multi-network per agent
D. Service manifestOrthogonal: a rendezvous record naming all deployment IDsComposable with A/B/C; required for discoverable-without-out-of-band-config

Zipnet picks B for topology, with optional derived private networks for high-volume internal plumbing, and compile-time instance-salt derivation for discovery — no on-network registry required. The rest of this page unpacks why and how.

Narrow public surface

The single most important discipline in this model is that a deployment exposes a small, named, finite set of primitives to the shared network. The ideal is one or two — a stream plus a collection, two streams, a state machine plus a collection, and so on. Everything else is private to the bundle and wired up by the deployment author, who is free to hardcode internal dependencies as aggressively as they like.

Zipnet’s outward surface decomposes cleanly into two functional roles, even though it carries several declare! types:

  • write-side: ClientRegistrationStream and ClientToAggregator — ticket-gated, predicate-gated, used by external TEE clients to join a round and submit sealed envelopes.
  • read-side: LiveRoundCell, Broadcasts, plus the two registries — read-only ambient round state that external agents need in order to seal envelopes and interpret finalized rounds.

An integrator’s mental model is “a way to write, a way to read”. They do not need to know the committee exists, how many aggregators there are, or how DH shuffles are scheduled. Internally the bundle looks like this:

  shared network                                     (public surface)
  ─────────────────────────────────────────────────────────────────
  ClientRegistrationStream, ClientToAggregator  ─┐
                                                 │
  LiveRoundCell, Broadcasts, ClientRegistry,   ◀─┤
  ServerRegistry                                 │
                                                 │
  ─────────────────────────────────────────────────
  derived private network (optional)             │  (private plumbing)
                                                 ▼
      Aggregator fan-in / DH-shuffle gossip      Committee Group<CommitteeState>
      Round-scheduler chatter                    AggregateToServers stream
                                                 BroadcastsStore (backs Broadcasts)

The committee Group stays on the shared network because the public-read collections are backed by it and bridging collections across networks is worse than the catalog noise. Only the genuinely high-churn channels belong on a derived private network.

The three conventions

Three things make this pattern work. A contributor starting a new service should reproduce all three.

1. Instance-salt discipline

Every public ID in a deployment descends from one root:

  INSTANCE     = blake3("zipnet." + instance_name)   // compile- or run-time
  SUBMIT       = INSTANCE.derive("submit")           // StreamId
  BROADCASTS   = INSTANCE.derive("broadcasts")       // StoreId
  COMMITTEE    = INSTANCE.derive("committee")        // GroupKey material
  ...

The top-level instance salt is a flat-string hash: compile-time via zipnet::instance_id!("acme.mainnet") (which expands to mosaik::unique_id!("zipnet.acme.mainnet")) and run-time via zipnet::instance_id("acme.mainnet") produce the same 32 bytes. Sub-IDs within the instance chain off it with .derive() for structural clarity.

An agent that knows instance_name can reconstruct every public ID from a shared declare! module. The consumer-side API is:

let zipnet   = Zipnet::bind(&network, "acme.mainnet").await?;
let receipt  = zipnet.publish(b"hello").await?;
let mut log  = zipnet.subscribe().await?;

Zipnet::bind is a thin constructor that derives the instance-local IDs and returns a handle wired to them. Raw StreamId/StoreId/GroupId values are never exposed across the crate boundary.

2. A Deployment-shaped convention

Authors should declare a deployment’s public surface once, in one place, so consumers can bind without reassembling ID derivations by hand. Whether this is a literal declare::deployment! macro or a hand-written impl Deployment is ergonomics; the constraint is that the public surface is a declared, named, finite set of primitives — not “whatever the bundle happens to put on the network today”.

Every deployment crate should export:

  • the public declare::stream! / declare::collection! types for its surface, colocated in a single protocol module
  • a bind(&Network, instance_name) -> TypedHandles function
  • the intended TicketValidator composition for each public primitive

A service that exposes eight unrelated collections has probably not thought hard enough about its interface.

3. A naming convention, not a registry

Derivation from (service, instance_name) is enough for a consumer who knows the instance name to bond to the deployment: both sides compute the same GroupId, StreamIds, and StoreIds, and mosaik’s discovery layer does the rest. No on-network advertisement is required — the service does not need to advertise its own existence.

A consumer typically pins the instance as a compile-time constant:

const ACME_ZIPNET: UniqueId = zipnet::instance_id!("acme.mainnet");
let zipnet = Zipnet::bind_by_id(&network, ACME_ZIPNET).await?;

…or by string when convenient:

let zipnet = Zipnet::bind(&network, "acme.mainnet").await?;

The operator’s complete public contract is three items: the universe NetworkId, the instance name, and (if the instance is TDX-gated) the MR_TD of the committee image. These travel via release notes, docs, or direct handoff. Nothing about the binding path touches a registry.

A directory may exist — a shared collection listing known instances — but it is a devops convenience for humans enumerating deployments, not part of the consumer binding path. Build it if you need it; nothing about the pattern requires it.

What this buys you

  • A third-party agent’s mental model collapses to: “one Network, many services, each bound by instance name.”
  • Multiple instances of the same service coexist trivially — each derives disjoint IDs from its salt.
  • ACL is per-instance, enforced at the edge via require_ticket on the public primitives; no second ACL layer is needed inside the bundle.
  • Internal plumbing can move to a derived private network without changing the public surface.
  • Private-side schema changes (StateMachine::signature() bumps) are absorbed behind the instance identity, as long as operators and consumers cut releases against the same version of the deployment crate.

Where the pattern strains

Three things are not free under this convention. Every new service author should be honest about them up front.

Cross-service atomicity is out of scope

There is no way to execute “mix a message AND rotate a multisig signer” in one consensus transaction. They are different Groups with different GroupIds, possibly with disjoint membership. If a service genuinely needs that — rare, but real for some coordination-heavy cases — the right answer is a fourth primitive that is itself a deployment providing atomic composition across services, not an ad-hoc cross-group protocol.

Versioning under stable instance names

If StateMachine::signature() changes, GroupId changes, and consumers compiled against the old code silently split-brain. Under multi-instance, the expectation is that “zipnet-acme” is an operator-level identity that outlives schema changes. Two ways to reconcile:

  • Let the instance salt carry a version (zipnet-acme-v2), and treat version bumps as retiring the old instance. Clean, but forces consumers to re-pin and release a new build on every upgrade.
  • Keep the instance name stable across versions and require operators and consumers to cut releases in lockstep against a shared deployment crate version. Avoids churn in instance IDs, at the cost of tighter coupling between operator and consumer release cadences.

Zipnet v1 does not need to resolve this. V2 must.

Noisy neighbours on the shared network

A shared NetworkId means every service’s peers appear in every agent’s catalog. Discovery gossip, DHT slots, and bond maintenance scale with the universe, not with the services an agent cares about. The escape hatch is the derived private network for internal chatter; the residual cost — peer-catalog size and /mosaik/announce volume — is paid by everyone. If a service’s traffic would dominate the shared network (high-frequency metric streams, bulk replication) it belongs behind its own NetworkId, not on the shared one. Shape A is the correct call when the narrow-interface argument no longer outweighs the noise argument.

Checklist for a new service

When adding a service to a shared mosaik universe, use this list:

  1. Identify the one or two public primitives. If you cannot, the interface is not yet designed.
  2. Pick a service root: unique_id!("your-service").
  3. Define instance-salt conventions: what instance_name means, who picks it, whether it carries a version.
  4. Write a bind(&Network, instance) -> TypedHandles that every consumer uses. Never export raw StreamId/StoreId/GroupId values across the crate boundary.
  5. Decide which internal channels, if any, move to a derived private Network. Default: only the high-churn ones.
  6. Specify TicketValidator composition on the public primitives. ACL lives here.
  7. Document your instance-name convention in release notes or docs. Consumers compile it in; you are on the hook for keeping the name stable and the code release version-matched.
  8. Call out your versioning story before shipping. If you cannot answer “what happens when StateMachine::signature() bumps?”, you will regret it.

Cross-references

  • Architecture — the concrete instantiation of this pattern for zipnet v1.
  • Mosaik integration notes — gotchas and idioms specific to the primitives referenced here.
  • Roadmap to v2 — where versioning-under-stable-names and cross-service composition work live.

Architecture

audience: contributors

This chapter is the concrete instantiation of the pattern described in Designing coexisting systems on mosaik for zipnet v1. It maps the paper’s three-part architecture (§2) onto mosaik primitives and identifies which of those primitives form the public surface on the shared universe versus the private plumbing that may live on a derived sub-network.

The reader is assumed to have read the ZIPNet paper, the mosaik book, and design-intro.

Deployment model recap

Zipnet runs as one service among many on the shared mosaik universe zipnet::UNIVERSE = unique_id!("mosaik.universe"). A deployment is a single zipnet instance: one committee, one ACL, one set of round parameters, one operator. Many instances coexist on the universe.

An instance is identified by a short operator-chosen name (acme.mainnet). Every public id in the instance descends from the instance salt:

  INSTANCE     = blake3("zipnet." + instance_name)    // root UniqueId
  COMMITTEE    = INSTANCE.derive("committee")         // Group<M> key material
  SUBMIT       = INSTANCE.derive("submit")            // ClientToAggregator StreamId
  REGISTER     = INSTANCE.derive("register")          // ClientRegistrationStream StreamId
  BROADCASTS   = INSTANCE.derive("broadcasts")        // Vec<BroadcastRecord> StoreId
  LIVE         = INSTANCE.derive("live-round")        // Cell<LiveRound> StoreId
  CLIENT_REG   = INSTANCE.derive("client-registry")   // Map StoreId
  SERVER_REG   = INSTANCE.derive("server-registry")   // Map StoreId

Consumers recompute the same derivations from the same name; no on-wire registry is involved. See design-intro — Instance-salt discipline.

Public surface (what lives on UNIVERSE)

The instance’s outward-facing primitives decompose into two functional roles:

  • write-sideClientRegistrationStream + ClientToAggregator. Ticket-gated, consumed by the aggregator. External TEE clients use these to join a round and submit sealed envelopes.
  • read-sideLiveRoundCell + Broadcasts + ClientRegistry + ServerRegistry. Read-only ambient round state every external agent needs in order to seal envelopes and interpret finalized rounds.

Integrators bind via the facade:

let network = Arc::new(Network::new(zipnet::UNIVERSE).await?);
let zipnet  = Zipnet::bind(&network, "acme.mainnet").await?;
let receipt = zipnet.publish(b"hello").await?;
let mut log = zipnet.subscribe().await?;

The facade hides StreamId / StoreId / GroupId entirely; they never cross the zipnet crate boundary.

Internal plumbing (optional derived private network)

Everything that is not part of the advertised surface is deployment- internal. In v1 it all runs on UNIVERSE alongside the public surface; this is the simplest place to start. A future deployment topology may move the high-churn channels onto a derived private Network keyed off INSTANCE.derive("private"):

  • AggregateToServers — aggregator → committee fan-out
  • any footprint-scheduling gossip
  • round-scheduler chatter

The committee Group<CommitteeMachine> itself stays on UNIVERSE because LiveRoundCell / Broadcasts / the two registries are backed by it; bridging collections across networks is worse than the extra catalog noise. See design-intro — Narrow public surface.

Data flow

                    shared universe (public surface)
  +--------+  ClientToAggregator   +-------------+  AggregateToServers  +-------------+
  | Client |  (stream)             |  Aggregator |  (stream) [*]        |  Committee  |
  |  TEE   | --------------------> |   role      | -------------------> |  Group<M>   |
  +--------+                       +-------------+                      +-------------+
       |                                    |                                    |
       |  ClientRegistrationStream          |                                    |
       +----------------------------------->|                                    |
                                            |                                    |
                        +-------------------+---------------------+--------------+
                        |                                                        |
                 ClientRegistry (Map<ClientId, ClientBundle>)    ServerRegistry (Map<ServerId, ServerBundle>)
                        |                                                        |
                        +-------------------------+------------------------------+
                                                  |
                                        LiveRoundCell (Cell<LiveRound>)
                                                  |
                                        Broadcasts (Vec<BroadcastRecord>)

  [*] may migrate to a derived private network in a future topology.

All four collections are declare::collection!-declared with intent- addressed StoreIds. The three streams are declare::stream!-declared the same way. In v1 every derived id salt is a literal string; a forthcoming Deployment-shaped convention (see design-intro §The three conventions) will replace the literal strings with chained .derive() calls off INSTANCE.

Pipeline per round

                t₀         t₁               t₂                    t₃
                 |          |                |                     |
  leader: ──── OpenRound ─── committed ─── LiveRoundCell mirrored  ─── Broadcasts appended
                 │          (to followers)                              (on finalize)
                 ▼
clients:    read LiveRoundCell,  seal envelope,  send on ClientToAggregator
                                                       │
                 ┌─────────────────────────────────────┘
                 ▼
aggregator: fold envelopes until fold_deadline,  send AggregateEnvelope
                                                       │
                 ┌─────────────────────────────────────┘
                 ▼
any committee server: receive,  group.execute(SubmitAggregate)
                                                       │
                                                       ▼
every committee server: see committed aggregate,  compute its partial,
                        group.execute(SubmitPartial)
                                                       │
                                                       ▼
state machine: all N_S partials gathered → finalize()  → apply() pushes
                                                           BroadcastRecord
                                                       │
                                                       ▼
apply-watcher on each server: mirror to LiveRoundCell / Broadcasts

Round latency is dominated by fold_deadline + one Raft commit round trip per SubmitAggregate and one per SubmitPartial.

Participant roles

Clients

Implemented in zipnet_node::roles::client. Each client is an Arc<Network> bonded to UNIVERSE, tagged zipnet.client, carrying a zipnet.bundle.client ticket on its PeerEntry. Event loop:

loop {
    live.when().updated().await;
    let header = live.get();
    if header.round == last { continue; }
    if !header.clients.contains(&self.id) { retry registration; continue; }
    let bundles = servers.get_all_in(header.servers);
    let sealed  = zipnet_core::client::seal(
        self.id, &self.dh, msg, header.round, &bundles, params,
    )?;
    envelopes.send(sealed.envelope).await?;
}

Aggregator

Implemented in zipnet_node::roles::aggregator. ClientRegistry writer. ClientToAggregator consumer. AggregateToServers producer. Does not join the committee group.

loop {
    live.when().updated().await;
    let header = live.get();
    let mut fold = RoundFold::new(header.round, params);
    let close = tokio::time::sleep(fold_deadline);
    loop {
        tokio::select! {
            _ = &mut close => break,
            Some(env) = envelopes.next() => {
                if env.round != header.round
                    || !header.clients.contains(&env.client) {
                    continue;
                }
                fold.absorb(&env)?;
            }
        }
    }
    if let Ok(agg) = fold.finish() {
        aggregates.send(agg).await?;
    }
}

Committee servers

Implemented in zipnet_node::roles::server. Joins Group<CommitteeMachine> as a Writer of ServerRegistry, LiveRoundCell, and Broadcasts; reads ClientRegistry. Single tokio::select! over three sources:

  1. group.when().committed().advanced() — drives the apply-watcher.
  2. AggregateToServers::consumer — feeds inbound aggregates via execute(SubmitAggregate).
  3. A periodic tick — leader-only round driver that opens new rounds via execute(OpenRound).

Why a dedicated Group<CommitteeMachine> and not just collections

The collections are each backed by their own internal Raft group. In principle all round orchestration could be pushed into a bespoke collection. We use a dedicated StateMachine because:

  1. Round orchestration needs domain transitions (Open → Aggregate → Partials → Finalize). These are hostile to Map / Vec / Cell CAS operations.
  2. Apply-time validation (e.g. rejecting aggregates that name non- roster clients) reads more clearly in apply(Command) than spread across collection CAS sequences.
  3. signature() is a clean place to pin wire / parameter version so incompatible nodes never form the same group.

The collections still pull their weight: they are the public-facing state external agents read without joining the committee group.

Identity universe

All IDs are 32-byte blake3 digests, via mosaik’s UniqueId. The aliases used in v1:

AliasDerivationScope
NetworkIdzipnet::UNIVERSE = unique_id!("mosaik.universe")shared universe
INSTANCEblake3("zipnet." + instance_name)one per deployment
GroupIdmosaik-derived from GroupKey(INSTANCE.derive("committee")) + ConsensusConfig + signature() + validatorsone per deployment’s committee
StreamId / StoreIdINSTANCE.derive("submit"), INSTANCE.derive("broadcasts"), etc. in the target layoutone per public primitive
ClientIdblake3_keyed("zipnet:client:id-v1", dh_pub)stable across runs iff dh_pub is persisted
ServerIdblake3_keyed("zipnet:server:id-v1", dh_pub)same
PeerIdiroh’s ed25519 public keyone per running Network

ClientId / ServerId are not iroh PeerIds. They’re stable across restarts iff the X25519 secret is persisted. In v1 (mock TEE default) every client run generates a fresh identity; in the TDX path the secret is sealed and ClientId becomes a long-lived pseudonym.

Current-state caveat: ZIPNET_SHARD

The v1 binaries (zipnet-server, zipnet-aggregator, zipnet-client) still take a ZIPNET_SHARD flag and derive a fresh NetworkId from unique_id!("zipnet.v1").derive(shard). This predates the UNIVERSE + instance-salt design and will be retired as the binaries migrate to Zipnet::bind on UNIVERSE. Treat it as a pre-migration artifact; new code should not replicate the pattern. The e2e integration test exercises this path today.

Boundary between zipnet-proto / zipnet-core / zipnet-node

  • zipnet-proto — wire types, crypto primitives, XOR. No mosaik types, no async, no I/O. Anything that could be reused by an alternative transport lives here.
  • zipnet-core — Algorithm 1/2/3 as pure functions. Depends on proto; no async, no I/O. The pure-DC-net round-trip test lives here.
  • zipnet-node — mosaik integration. Owns CommitteeMachine, all declare! items, all role loops. Everything async, everything I/O.
  • zipnet — SDK facade. Wraps zipnet-node behind Zipnet::bind(&network, "instance_name"); hides mosaik types from consumers.

See Crate map for the full workspace layout and design-intro — Narrow public surface for the rationale behind the facade boundary.

Cross-references

Crate map

audience: contributors

Workspace at /Users/karim/dev/flashbots/zipnet/. Edition 2024, MSRV 1.93. Mosaik pinned to =0.3.17 (see CLAUDE.md for rationale).

zipnet-proto  (pure: no mosaik, no tokio, no I/O)
    ▲
    │
zipnet-core   (pure: no mosaik, no tokio, no I/O)
    ▲
    │
zipnet-node   ── mosaik 0.3.17 ── iroh 0.97 (QUIC)
    ▲ ▲
    │ └──────────────────────────┐
    │                            │
zipnet (SDK facade)              ├── zipnet-client
                                 ├── zipnet-aggregator
                                 └── zipnet-server

The split between -proto, -core, and -node is load-bearing, not cosmetic. Anything that touches tokio, mosaik, or I/O must live in -node (or higher). Anything that could be reused by an alternative transport lives in -proto / -core. If you find yourself reaching for tokio::spawn or mosaik:: inside -proto or -core, you are in the wrong crate.

zipnet-proto

Pure wire types and crypto primitives. No mosaik, no async.

ModuleRole
wireClientEnvelope, AggregateEnvelope, PartialUnblind, BroadcastRecord, ClientId, ServerId, RoundId
cryptoHKDF-SHA256 salt composition, AES-128-CTR pad generator, blake3 falsification tag
keysDhSecret (X25519 StaticSecret), ClientKeyPair, ServerKeyPair, public ClientBundle / ServerBundle
paramsRoundParams (broadcast shape)
xorxor_into, xor_many_into over equal-length buffers

WIRE_VERSION is bumped any time a wire or params shape changes. CommitteeMachine::signature() in zipnet-node mixes this in so nodes with different wire versions will never form a group.

zipnet-core

Paper’s algorithms as pure functions over -proto types. No async.

ModuleRole
client::sealAlgorithm 1 — TEE-side sealing of one envelope
aggregator::RoundFoldAlgorithm 2 — stateful XOR fold of envelopes for one round
server::partial_unblindAlgorithm 3 — per-server partial computation
server::finalizeCommittee combine — aggregate + partials → broadcast
slotDeterministic slot assignment + slot layout helpers

The full round trip is exercised by server::tests::e2e_two_servers_three_clients, which constructs a 3-server / 4-client setup (2 talkers + 2 cover) and asserts that the final BroadcastRecord contains each talker’s plaintext at the expected slot with a valid falsification tag. No transport is involved — this is the pure-algebra proof.

zipnet-node

The only non-SDK crate that imports mosaik. Hosts the declare! items, the committee state machine, and the role event loops.

ModuleRole
protocoldeclare::stream! + declare::collection! items, tag constants, ticket class constants
committeeCommitteeMachine: StateMachine, Command, Query, QueryResult, LiveRound, CommitteeConfig
ticketsBundleValidator<K>: TicketValidator for client / server bundle tickets
roles::commonNetworkBoot helper that wraps iroh secret, tags, tickets, and mDNS setup
roles::clientclient event loop
roles::aggregatoraggregator event loop
roles::servercommittee server event loop (single tokio::select! over three event sources)

The role modules are reusable as a library — the three binaries are thin CLI wrappers around them. Test code in crates/zipnet-node/tests/e2e.rs reuses the same primitives but inlines the server loop so it can inject a pre-built Arc<Network> and cross-sync_with all peers before anything starts (same pattern as mosaik’s examples/orderbook).

protocol.rs today vs target

protocol.rs currently declares its StreamId / StoreId literals as flat strings ("zipnet.stream.client-to-aggregator", etc.). The target per design-intro is INSTANCE.derive("submit") / .derive("broadcasts") / … chained off the per-deployment instance salt so multiple instances can coexist on one mosaik universe without colliding. The migration removes the ZIPNET_SALT.derive(shard) NetworkId scoping in favour of the shared zipnet::UNIVERSE constant.

zipnet (SDK facade)

Public surface for consumers. Wraps zipnet-node and hides all mosaik types (StreamId, StoreId, GroupId) from callers.

ModuleRole
environmentsUNIVERSE constant, instance_id(&str) fn, instance_id! macro
clientZipnet::bind, Zipnet::bind_by_id, publish, subscribe, shutdown
errorError { WrongUniverse, ConnectTimeout, Attestation, Shutdown, Protocol }
typesReceipt, Round, Outcome, Message
driverinternal task that plumbs publishes onto ClientToAggregator and broadcasts back

Re-exports from mosaik that the SDK intentionally surfaces: UniqueId, NetworkId, Tag, unique_id!. Nothing else is re-exported — callers that need raw mosaik types have fallen off the supported path and should drop to zipnet-node directly.

zipnet::instance_id(name) and zipnet::instance_id!("name") must produce byte-identical outputs; the macro lowers to mosaik::unique_id!(concat!("zipnet.", $name)) and the runtime fn is UniqueId::from("zipnet." + name). If you change one, change the other.

Binaries

Thin CLI wrappers around zipnet-node::roles::*. In v1 they still take a ZIPNET_SHARD flag and scope to ZIPNET_SALT.derive(shard); this predates the UNIVERSE + instance design and will be retired as the binaries migrate to Zipnet::bind on UNIVERSE.

CrateFlags of note
zipnet-clientZIPNET_MESSAGE, ZIPNET_CADENCE
zipnet-aggregatorZIPNET_FOLD_DEADLINE
zipnet-serverZIPNET_COMMITTEE_SECRET, ZIPNET_MIN_PARTICIPANTS, ZIPNET_ROUND_PERIOD, ZIPNET_ROUND_DEADLINE

Each binary also takes the common ZIPNET_SHARD, ZIPNET_SECRET, ZIPNET_BOOTSTRAP, ZIPNET_METRICS — see Environment variables.

Feature flags

  • zipnet-node/tee-tdx (off by default) — folds mosaik::tickets::Tdx::new().require_own_mrtd()? into the committee’s admission validators. Requires mosaik’s tdx feature (on by default) and TDX hardware.
  • zipnet-client/tee-tdx, zipnet-server/tee-tdx — re-export flips of the node crate’s flag.

Mock TEE is the default path (// SIMPLIFICATION: in source); TDX is opt-in for v1 and the critical-path enforcement lands in v2 (see Roadmap).

Dependency choices worth knowing

  • x25519-dalek 2.0 pins rand_core 0.6 (not workspace rand 0.9). We break workspace coherence in zipnet-proto/Cargo.toml by pulling rand_core = "0.6" explicitly for OsRng compatibility with StaticSecret::random_from_rng. The crate-proper rand dep is workspace-pinned.
  • mosaik = "=0.3.17" — the API we developed against. Upgrades are expected to break compile; the declare::stream! / declare::collection! macros are stable-ish, the ticket and group APIs have shifted across minor versions.

Cryptography

audience: contributors

All cryptographic primitives live in zipnet-proto. This chapter is a rationale + proof-sketch document; correctness tests are in zipnet-proto::crypto::tests and the end-to-end algebraic test is zipnet_core::server::tests::e2e_two_servers_three_clients. Nothing on this page is deployment-topology-specific — the KDF schedule and falsification-tag construction are identical under any instance layout. See design-intro for how the instance salt (and hence schedule_hash, once footprint scheduling lands in v2) attaches to a deployment.

Primitives

PurposePrimitiveCrate
Key agreementX25519x25519-dalek 2.0
Key derivationHKDF-SHA256hkdf 0.12
Pad generationAES-128 in CTR modeaes 0.8 + ctr 0.9
Falsification tagkeyed-blake3blake3 1.8
ID derivationkeyed-blake3blake3 1.8
Peer-entry signaturesed25519via iroh

Notable negatives: no signatures from the prototype itself — clients do not ed25519-sign their envelopes because iroh already signs the PeerEntry that carries their bundle and the stream transport is authenticated QUIC. We rely on mosaik’s session security, not on an application-level signature scheme.

Per-round key schedule

For each (client, server, round) pair the protocol computes a one-time pad P of length B = num_slots * slot_bytes:

  shared  = X25519(client_sk, server_pk)                // 32 bytes
  salt    = params_prefix ‖ round ‖ schedule_hash       // 56 bytes
  prk     = HKDF-Extract(salt, shared)                  // 32 bytes
  key     = HKDF-Expand(prk, "zipnet/pad/v1", 16)       // 16 bytes
  iv      = round_le ‖ zeros                            // 16 bytes
  P       = AES-128-CTR(key, iv, zeros of length B)

where params_prefix is a little-endian encoding of (wire_version, num_slots, slot_bytes, tag_len) and schedule_hash is the 32-byte NO_SCHEDULE constant in v1 (the footprint scheduling reservation vector hash in v2).

Why this structure

  • Salt over (params, round, schedule_hash) binds the pad to every negotiated round parameter. A client or server computing with a different RoundParams derives a different pad; in the XOR algebra this reduces the colliding result to noise, not to a silent crypto vulnerability. The WIRE_VERSION in the salt prefix extends this to major-version boundaries.
  • HKDF-Extract over the raw DH shared secret, not a hash of it. X25519 shared secrets are uniform in the twist-restricted subgroup; HKDF’s extract step is the standard step to convert that into a uniform PRK.
  • AES-128-CTR with a round-prefixed IV. A fresh IV = (round‖0⁸) gives every round a non-overlapping counter space; the sequence of counters within a round is (round‖0⁸) + 0, 1, 2, .... As long as two rounds never share round, the AES key–IV pair is never reused. The round: u64 ensures uniqueness across realistic deployments.
  • HKDF-Expand labelled "zipnet/pad/v1". The label guards against accidental reuse of the same PRK across crypto contexts; bumping it to "zipnet/pad/v2" is free domain separation.
  • AES-128 over a stream cipher. AES-NI accelerated; output is pseudorandom; the commutativity that DC nets require (XOR) is immediate.

What this buys

For any honest client C and honest server S that agree on the five inputs (shared_secret, wire_version, num_slots, slot_bytes, tag_len, round, schedule_hash), they derive byte-identical pads. The XOR operation is commutative, so the order in which the aggregator and the committee XOR in their contributions is irrelevant.

For any adversary who does not know shared_secret, the pad is indistinguishable from uniformly random under the standard DDH assumption on Curve25519 (for the X25519 step) and the PRF security of AES-128 (for the expansion step), given a secure HKDF.

What this does not buy

  • Forward secrecy. A compromise of shared_secret compromises every past and future round for that (client, server) pair until the secret is rotated. v2 ratchets shared_secret ← HKDF-Extract(shared_secret, "ratchet") at each round boundary.
  • Authentication of the envelope itself. The mosaik transport authenticates the sender PeerId (ed25519); the pad binds the envelope to round and client via the KDF inputs. But an adversary who can inject bytes at the transport layer as a specific peer can replay or mutate envelopes. We rely on iroh’s QUIC/TLS.

Falsification tags

The paper’s §3 “falsification tag” is a keyed-blake3 XOF of the plaintext message:

pub fn falsification_tag(message: &[u8], tag_len: usize) -> Vec<u8> {
    let key = blake3::derive_key("zipnet:falsification-tag:v1", &[]);
    let mut h = blake3::Hasher::new_keyed(&key);
    h.update(message);
    let mut buf = vec![0u8; tag_len];
    h.finalize_xof().fill(&mut buf);
    buf
}

Why keyed-blake3, not HMAC

  • Keyed-blake3 is a PRF under the standard security argument for blake3-keyed and is enormously faster than HMAC-SHA256 at the sizes involved.
  • The key is a domain-separating constant ("zipnet:falsification-tag:v1") not a secret; the goal is not authentication from an adversary, it’s cross-slot collision resistance.

What the tag protects against

  • Malicious client corrupting another honest client’s slot. Slots are deterministically assigned (v1) or reservation-checked (v2). Collisions across clients overwrite both messages with their XOR. An honest client’s tag is computed on its original message; after the XOR with garbage, the tag at the published slot no longer matches the visible payload bytes → any observer rejects the slot as corrupted.
  • Malicious client writing garbage in an unused slot. The unused-slot hypothesis fails the tag check; observers skip it.

What the tag does not protect against

  • A malicious client corrupting its own slot by writing nonsense and computing a tag over that nonsense. In v1 this is a trivial DoS against the client itself; the protocol treats the published broadcast as authoritative.
  • Cross-round correlation attacks based on message length or pattern.

Identity derivation

ClientId = blake3_keyed("zipnet:client:id-v1", dh_pub), ServerId = blake3_keyed("zipnet:server:id-v1", dh_pub), both XOF’d to 32 bytes.

Separate domain strings per role prevent an adversary who harvests a client’s dh_pub from spoofing a server with the same identifier, which would matter if we ever compared ClientIds and ServerIds inside the state machine (we don’t, but the separation is free).

Constant-time concerns

  • X25519 in x25519-dalek is constant-time by design.
  • AES-128-CTR in aes + ctr uses AES-NI on recent x86_64 / ARM — the assembly path is constant-time.
  • HKDF (SHA-256) is constant-time over inputs of a fixed length.
  • XOR buffers are word-wise and constant-time.
  • The equality check for tag verification is Vec::eqnot constant-time. This is fine: tag comparison is against a public broadcast, not against a secret.

If a contributor adds a secret comparison path, they should reach for subtle::ConstantTimeEq rather than ==.

Cryptographic agility

None. The prototype nails down curve (X25519), hash (blake3, SHA-256), and cipher (AES-128) because each choice is folded into a string constant in the KDF. To change any of them, bump WIRE_VERSION and the corresponding label ("zipnet/pad/v1""zipnet/pad/v2").

Rotating the curve to, say, X448 would require a new DhSecret type and a corresponding ClientBundle / ServerBundle layout change. There is no on-wire negotiation of crypto parameters — nodes that disagree are isolated into disjoint groups by construction.

The committee state machine

audience: contributors

Source: crates/zipnet-node/src/committee.rs.

Trait shape

impl StateMachine for CommitteeMachine {
    type Command     = Command;
    type Query       = Query;
    type QueryResult = QueryResult;
    type StateSync   = LogReplaySync<Self>;

    fn signature(&self) -> UniqueId { ... }
    fn apply(&mut self, cmd: Command, ctx: &dyn ApplyContext) { ... }
    fn query(&self, q: Query)        -> QueryResult { ... }
    fn state_sync(&self)             -> LogReplaySync<Self> { LogReplaySync::default() }
}

LogReplaySync is the default; the committee state is small (< 1 KB per round) so replaying the log on catch-up is cheap. When we add per-round archival in v2 we’ll swap in a snapshot strategy.

Commands

pub enum Command {
    OpenRound(LiveRound),
    SubmitAggregate(AggregateEnvelope),
    SubmitPartial(PartialUnblind),
}

Each command is idempotent:

  • OpenRound: resets current to a fresh InFlight(header). If a previous round was not finalized, its state is silently dropped — the leader is the authority on when to move on.
  • SubmitAggregate: first valid submission wins. Duplicates from follower forwarding are silently ignored. Validation checks:
    • round matches current.header.round,
    • payload length matches config.params.broadcast_bytes(),
    • participant set is non-empty,
    • every participant is in current.header.clients (no rogue clients).
  • SubmitPartial: first partial per (round, server) wins. Validation:
    • round matches,
    • partial length matches,
    • server is in current.header.servers.

When a partial submission brings the total to N_S and an aggregate has been submitted, apply() calls zipnet_core::server::finalize(...) and pushes the resulting BroadcastRecord into self.broadcasts. Everything after that is apply()-synchronous and deterministic.

Queries

pub enum Query {
    LiveRound,
    CurrentAggregate,
    PartialsReceived,
    RecentBroadcasts(u32),
}

Queries are read-only and do not replicate. The apply-watcher task on each server uses weak-consistency queries to drive its side effects (mirror LiveRound to LiveRoundCell, push broadcasts into the Broadcasts vec collection, issue partial submissions when an aggregate appears).

Signature versioning

fn signature(&self) -> UniqueId {
    let tag = format!(
        "zipnet.committee.v{WIRE_VERSION}.slots={}.bytes={}.min={}",
        self.config.params.num_slots,
        self.config.params.slot_bytes,
        self.config.min_participants,
    );
    UniqueId::from(tag.as_str())
}

signature() is folded into the GroupId by mosaik, alongside the GroupKey (derived from INSTANCE.derive("committee")) and the consensus config. Therefore:

  • Bumping WIRE_VERSION (wire or params breaking change) isolates old nodes from new.
  • Changing num_slots, slot_bytes, or min_participants likewise forces a fresh group, so nodes can’t silently fork on divergent config.
  • Changing the instance name (and hence INSTANCE) disjoins the deployments; two acme.mainnet / acme.testnet deployments share no GroupId even under identical params. See design-intro — Instance-salt discipline.

If you add a field to CommitteeConfig or change apply semantics without touching signature(), two nodes with incompatible code will form the same group and diverge at the apply level. Always bump the signature string when apply() or Command semantics change. That’s the invariant.

What this machine guarantees vs. does not

The state machine guarantees round ordering, exactly-once partial admission, and deterministic finalization under Raft’s normal crash- fault tolerance. It deliberately guarantees nothing about anonymity — anonymity is a property of the cryptographic protocol (any-honest-server DC-net algebra, see Threat model), not of consensus. Byzantine committee members cannot break anonymity via the state machine path; they can only withhold or submit bogus partials, which is an availability problem.

Apply-context usage

ApplyContext exposes deterministic metadata. We use it only in a debug log right now:

debug!(
    round = %header.round,
    "committee: opening round at index {:?}",
    ctx.log_position(),
);

Anything derived from ctx is safe to use in state mutation because mosaik guarantees it is identical on every replica. If v2 needs a per-round random salt, pulling it from ctx.log_position() and ctx.current_term() is the deterministic path.

The apply-watcher

The reason apply() doesn’t write directly to the public collections: apply() is synchronous and must be free of I/O to keep the state machine deterministic. Side effects on the outside world go through a task that polls the group after every commit advance:

tokio::select! {
    _ = group.when().committed().advanced() => {
        let live  = group.query(Query::LiveRound, Weak).await?.into();
        let agg   = group.query(Query::CurrentAggregate, Weak).await?.into();
        let recent = group.query(Query::RecentBroadcasts(8), Weak).await?.into();
        reconcile_into_collections(live, agg, recent).await;
        maybe_submit_my_partial(agg).await;
    }
    // ...
}

This is the same pattern the mosaik book recommends for “state machine emits events, side-effect task consumes them”. Because queries are weak-consistency reads of the local replica, they are lock-free and fast; by the time we see the commit advance, the local apply has already run.

Idempotency and replays

  • A follower that crashes mid-apply replays the log on recovery. Because apply() is deterministic, replaying yields the same state.
  • A client that never sees its round finalized and retries on the next LiveRound is safe: the new round has a fresh RoundId, new pads, new envelope. No anti-replay logic is needed at the protocol layer.
  • An aggregator retrying SubmitAggregate after a leader flip is safe: the state machine rejects duplicates.
  • A server retrying SubmitPartial after its own restart is safe for the same reason.

Sizes of in-flight state

FieldSize per round
LiveRound.clientsN * 32 bytes
LiveRound.serversN_S * 32 bytes
aggregate.aggregateB bytes (default 16 KiB)
partialsN_S * (32 + 8 + B) bytes

Finalization pushes one BroadcastRecord (size: B + N*32 + N_S*32) into self.broadcasts which is retained in RAM indefinitely in v1. For long-running deployments you will want external archival; see Operators — Accounting and audit.

Mosaik integration notes

audience: contributors

Drop-in advice, footguns, and places where the prototype bumped into the mosaik 0.3.17 API. This is a grab-bag — sorted roughly by how likely a contributor is to trip over each item. For the higher-level deployment conventions that sit above mosaik, see design-intro.

Instance-salt derivation

Every public id in a zipnet deployment descends from the instance salt:

use mosaik::{UniqueId, unique_id};

// Compile-time: typos become build errors.
pub const ACME: UniqueId = zipnet::instance_id!("acme.mainnet");
//                         expands to unique_id!("zipnet.acme.mainnet")

// Runtime: same 32 bytes as the macro for the same name.
let id = zipnet::instance_id("acme.mainnet");
assert_eq!(id, ACME);

// Sub-ids chain with .derive().
let committee_key   = ACME.derive("committee");    // GroupKey material
let submit_stream   = ACME.derive("submit");       // StreamId
let broadcasts_store = ACME.derive("broadcasts");  // StoreId

The invariant: instance_id(name) and instance_id!("name") must produce byte-identical outputs. The macro lowers to unique_id!(concat!("zipnet.", $name)); the runtime fn is UniqueId::from("zipnet." + name). Change one, change the other. Never expose raw StreamId / StoreId / GroupId values across the zipnet crate boundary — Zipnet::bind is the only supported path.

The declare::stream! predicate direction

Reading the macro source (mosaik-macros/src/stream.rs in the mosaik repo) reveals the following:

“For require and require_ticket, the side prefix describes who must satisfy the requirement, not who performs the check. consumer require_ticket: V means consumers need a valid ticket, so the producer runs the validator — route to the opposite side.”

So in our ClientToAggregator stream:

declare::stream!(
    pub ClientToAggregator = ClientEnvelope,
    "zipnet.stream.client-to-aggregator",
    producer require: |p| p.tags().contains(&CLIENT_TAG),
    consumer require: |p| p.tags().contains(&AGGREGATOR_TAG),
    producer online_when: |c| c.minimum_of(1).with_tags("zipnet.aggregator"),
);
  • producer require: |p| p.tags().contains(&CLIENT_TAG) → “the producer must have the zipnet.client tag” → enforced on the consumer side (aggregator subscribes only to peers tagged zipnet.client).
  • consumer require: |p| p.tags().contains(&AGGREGATOR_TAG) → “the consumer must have the zipnet.aggregator tag” → enforced on the producer side (client accepts subscribers only if they’re tagged zipnet.aggregator).

Getting this inverted produces symptoms like rejected consumer connection: unauthorized in the producer logs, with consumer PeerEntry tag counts of 1 that don’t match the expected role. The clue is that the producer is the one rejecting; consumer-requires apply on the producer.

Without both clauses, any peer on the network could subscribe to your client’s envelope stream — defeating the point. The ticket-based analog is require_ticket, which is what you want in the TDX-enabled path.

Group<M>, Map<K,V>, Network are not Clone

All three hold Arc internally but don’t derive or implement Clone. When you need to share them across spawned tasks, wrap in a fresh Arc:

let group   = Arc::new(network.groups()...join());
let network = Arc::new(builder.build().await?);

tokio::spawn({
    let group = Arc::clone(&group);
    async move { ... group.execute(...).await ... }
});

Group::execute, Group::query, Group::feed return futures that are 'static — they take ownership of the arguments they need at the moment of call, so passing Arc<Group> + Arc::clone() into each task is the straightforward pattern.

The server role deliberately keeps the Group inside a single tokio::select! rather than spawning task-per-responsibility so we avoid the Arc noise. The integration test in zipnet-node/tests/e2e.rs does the same.

QueryResultAt<M> doesn’t pattern-match directly

group.query(...).await? returns Result<QueryResultAt<M>, QueryError<M>> where QueryResultAt<M> is #[derive(Deref)] with Target = M::QueryResult. You cannot pattern-match QueryResultAt against variants of your QueryResult. The canonical destructure:

let qr = group.query(Query::LiveRound, Consistency::Weak).await?;
let QueryResult::LiveRound(live) = qr.into() else { return Ok(()) };

QueryResultAt::into is inherent (not From) and returns the M::QueryResult by value.

Cell write / clear

let cell = LiveRoundCell::writer(&network);
cell.set(header).await?;   // atomic replace
cell.clear().await?;       // empty

There is no unset — the method is clear. Cell already has Option-like emptiness semantics, so Cell<T> gives you the “sometimes present” store you’d expect; no need for Cell<Option<T>>.

StateMachine::apply can’t be async

Apply is synchronous by contract. Side effects that need async (e.g. writing to a collection, sending a stream, issuing another command) must happen in a separate task that watches the commit cursor and reads the state machine via queries:

loop {
    tokio::select! {
        _ = group.when().committed().advanced() => reconcile().await?,
        Some(msg) = stream.next() => forward(msg).await?,
        _ = period.tick() => maybe_open_round().await?,
    }
}

The apply-watcher in zipnet-node/src/roles/server.rs::reconcile_state is the canonical implementation in our prototype.

InvalidTicket is a unit struct

mosaik::tickets::InvalidTicket doesn’t have ::new; it’s a bare struct InvalidTicket;. Return it as:

return Err(InvalidTicket);

Context goes into the tracing log, not into the error, because the error is opaque at the protocol level.

GroupKey::from(Digest)

GroupKey: From<Secret> where Secret = Digest. The ergonomic constructor from a caller-provided string:

let key = GroupKey::from(mosaik::Digest::from("my-committee-secret"));

GroupKey::from_secret(impl Into<Secret>) is the same thing; either works. GroupKey::random() is present but not what you want in production because every committee member must converge on the same value.

Discovery on localhost

iroh’s pkarr/Mainline DHT bootstrap is unreliable for same-box tests. For integration tests, cross-call sync_with between every pair of networks (same pattern as mosaik’s examples/orderbook::discover_all):

async fn cross_sync(nets: &[&Arc<Network>]) -> anyhow::Result<()> {
    for (i, a) in nets.iter().enumerate() {
        for (j, b) in nets.iter().enumerate() {
            if i != j {
                a.discovery().sync_with(b.local().addr()).await?;
            }
        }
    }
    Ok(())
}

For out-of-process binaries, pass an explicit --bootstrap <peer_id> pointing at a well-known node.

Tag = UniqueId, no tag! macro

Book examples show tag!("...") but 0.3.17 exports no such macro. Tag is an alias for UniqueId, so use unique_id!("...") for compile-time construction:

pub const CLIENT_TAG: Tag = unique_id!("zipnet.client");

Runtime construction is Tag::from("...") via the From<&str> impl on UniqueId.

Declaring collections that don’t exist at use time

The declare::collection! macro refers to its value type by path, so you can declare a collection over a type defined later in the same crate:

// src/protocol.rs
use crate::committee::LiveRound;

declare::collection!(
    pub LiveRoundCell = mosaik::collections::Cell<LiveRound>,
    "zipnet.collection.live-round",
);

LiveRound is defined in src/committee.rs; the macro’s expansion resolves the path at compile time in the usual way.

Network::builder(...).with_mdns_discovery(true)

mDNS is off by default in 0.3.17. For single-box testing and for clusters on the same LAN, turning it on collapses discovery latency from minutes (DHT bootstrap) to sub-seconds. Costs nothing on WAN deployments where it silently no-ops.

Network::builder(network_id)
    .with_mdns_discovery(true)
    .with_discovery(discovery::Config::builder().with_tags(tags))
    .build().await?;

We enable it unconditionally in NetworkBoot::boot.

TDX gating: install own ticket, require others’

Mosaik’s TDX support composes on both sides of the peer-entry dance. The idiomatic zipnet committee setup:

// On boot, if built with the tee-tdx feature:
network.tdx().install_own_ticket()?;  // attach our quote to our PeerEntry

// When joining the committee or a public collection, require peers
// to present a matching TDX quote:
use mosaik::tickets::Tdx;
let tdx_validator = Tdx::new().require_mrtd(expected_mrtd);

// Stack with BundleValidator via multi-require_ticket:
group_builder
    .require_ticket(BundleValidator::<ServerBundleKind>::new())
    .require_ticket(tdx_validator);

expected_mrtd comes from the reproducible committee-image build and is published alongside the instance name (see design-intro — A naming convention, not a registry). In v1, BundleValidator is the only admission check in the non-TDX path; TDX critical-path enforcement lands in v2 (Roadmap).

Threat model

audience: contributors

This chapter restates the paper’s adversary model (§3.3) against the concrete objects that exist in our prototype, and gives proof sketches for the claims we make. The claims are scoped to one zipnet instance — the committee Group<CommitteeMachine> identified by INSTANCE.derive("committee") for a given operator-chosen name (see design-intro — Instance-salt discipline). Distinct instances on the same universe have disjoint GroupIds, disjoint rosters, and disjoint anonymity sets; what holds for one says nothing about another. Multi-instance composition is out of scope here.

Goals and non-goals

Goal: unlinkability of (author, message) for messages published in the Broadcasts collection, against any adversary that controls at most N_S − 1 of N_S committee servers, the aggregator, the TEE host (of an unbounded subset of clients), and the network. The adversary does not control a strict majority of the honest clients. (The precise (t, n)-anonymity formulation is in Appendix A of the paper.)

Non-goals:

  • Byzantine fault tolerance of the consensus layer. Mosaik’s Raft variant is crash-fault tolerant, not Byzantine.
  • Availability under any adversarial committee participation. In v1, a single crashed committee server halts round progression.
  • Confidentiality of application payload. Once finalized, broadcast is world-readable by design.
  • Resistance to message-length side channels (see security checklist).

Attacker powers

What the adversary can do:

  1. Read and modify any packet on the wire. iroh/QUIC authenticates peer identities, so the adversary cannot impersonate an honest node, but can block, delay, or corrupt packets (triggering Raft timeouts and stream reconnects).
  2. Control the operating system of any non-TEE node, including committee servers it is designated to operate.
  3. Issue arbitrary Commands to the committee via a corrupt server (which forwards its own commands into the Raft log) or via a corrupt client (which sends arbitrary ClientEnvelopes through the aggregator).
  4. Compromise the TEE of any number of clients (and read their DH secrets) in the v1 mock path.

What the adversary cannot do (by assumption or by protocol):

  1. Compromise the TEE of a client in the v2 TDX path without triggering attestation failure. (Formal: SGX/TDX bound by the hardware root of trust.)
  2. Compromise the DH secret of every committee server simultaneously — anonymity requires at least one honest server.
  3. Force a BroadcastRecord to contain a participants list that includes an unregistered ClientId: the state machine rejects such an aggregate at SubmitAggregate apply time (see committee state machine).

Anonymity sketch

Let C₁, ..., C_N be the clients participating in round r. Each client C_i contributes msg_i ⊕ (XOR over servers of pad_ij) to the aggregate. The aggregate is:

agg_r = XOR_i (msg_i ⊕ XOR_j pad_ij)
      = (XOR_i msg_i) ⊕ (XOR_i XOR_j pad_ij)

The broadcast is agg_r ⊕ (XOR_j partial_j) where partial_j = XOR_i pad_ij. Substituting:

broadcast = (XOR_i msg_i) ⊕ (XOR_i XOR_j pad_ij) ⊕ (XOR_j XOR_i pad_ij)
          = (XOR_i msg_i)              // the inner pads cancel

So the broadcast is exactly the XOR of every client’s slotted message. Given the deterministic slot assignment, messages land in distinct slots (modulo collisions) and can be read back slot-by-slot.

For unlinkability: given any one honest server j* whose pad secrets are unknown to the adversary, every pad_{ij*} is PRF-indistinguishable from uniform random (under the PRF security of the HKDF-AES construction). Each honest client’s envelope_i = msg_i ⊕ XOR_j pad_ij is therefore PRF-indistinguishable from uniform — the adversary cannot distinguish which honest client authored which envelope. This is the standard DC-net anonymity argument under the any-trust assumption.

The paper strengthens this to a (t, n) game (Appendix A). The state-machine-level permutation check at SubmitAggregate apply ensures the aggregate’s participants vector is a subset of the round’s client roster: any participants shuffle by the adversary is a subset of already-known IDs, so the permutation is within the honest anonymity set.

Integrity: what the state machine guarantees

  • A committed BroadcastRecord is the result of exactly one SubmitAggregate followed by exactly one SubmitPartial per committee member in that round’s servers snapshot. No partial is double-counted; no aggregate is re-applied.
  • Every published broadcast in the log is computable deterministically from the committed commands. A replay (e.g. after a committee server restart) produces the identical byte sequence.

Integrity: what the state machine does not guarantee

  • The honesty of the aggregator’s fold. A malicious aggregator can:

    • omit an envelope (DoS a specific client),
    • include a garbage envelope attributed to a real client’s ClientId (see below),
    • lie about the participants list.

    The state machine rejects a SubmitAggregate whose participants set is not a subset of the LiveRound.clients roster, preventing the aggregator from naming rogue clients. It does not reject an aggregator that names honest clients whose envelopes were never received — but in that case the partial unblinds will remove the expected pads, and the slot of the missing client will show noise (since msg_i = 0 was not what the honest client sent).

    A malicious aggregator cannot break anonymity; it can only degrade availability and introduce noise into specific slots.

  • The honesty of a committee server’s partial. A malicious server can submit a garbage partial. The broadcast will be XORed with that garbage and published as garbage. The state machine has no way to detect this — DC-net unblinding does not carry a zero-knowledge proof. This is consistent with the paper: malicious servers break availability, not anonymity.

    A v2 mitigation (not in v1) is an anti-disrupter phase modeled on Riposte’s auditing or Blinder’s MPC format check.

Failure modes that break anonymity (not in the adversary model)

  • All committee servers collude. By assumption the any-trust model is void; anonymity is lost. Operators must enforce the any-trust diversity axiom out of band.
  • The same DH secret is used across roles. Re-using a DhSecret between a committee server and a client (a pathological misconfiguration) would let the server correlate its own client envelopes with its own partial unblinds. The ClientId / ServerId type separation guards against this at the type level.
  • Traffic analysis across rounds. ZIPNet per se does not defend against a global passive adversary who correlates client connection times across many rounds. This is a transport-level concern and is inherited from mosaik’s iroh transport.
  • Universe-level co-location. Running on the shared mosaik universe (Shape B in design-intro) does not weaken the anonymity argument: admission to the committee group and to the public write-side streams is gated per-instance by TicketValidator composition (BundleValidator<K> today, + Tdx::new().require_mrtd(...) in the TDX path). A peer on the universe who does not present the expected bundle — or MR_TD — is not admitted to the bond, and therefore cannot submit a Command, a partial, or a client envelope. The universe topology is a discovery-scope decision, not a trust-scope decision.

Denial-of-service surface

AttackerAttackEffect
Compromised TEEFlood envelopesAggregator backpressures, drops lagging stream senders (mosaik TooSlow code 10_413)
Compromised aggregatorOmit / delay aggregatesRounds stall until the committee’s round_deadline fires
Compromised committee serverOmit partialRound never finalizes; operator intervenes or the server is rotated out
Compromised committee serverSubmit malformed partialBroadcast is garbage for this round; next round is clean
NetworkDrop / delay packetsRaft heartbeats time out, election thrashes, rounds delayed

All of these are availability issues and none of them break anonymity of past or future rounds.

Roadmap to v2

audience: contributors

These are the simplifications baked into v1 and the planned path to address each. The order here is not the implementation order — it is the order in which each change affects the external behavior of the system.

Footprint scheduling

v1: deterministic slot per (client, round) via keyed-blake3 mod num_slots. Collision probability ≈ N / num_slots; at N = 8, num_slots = 64 that’s ~12%.

v2: the paper’s two-channel scheduling (§3.2). A side channel of 4 * N slots holds footprint reservations. Clients pick a random slot and an f-bit random footprint each round, write the footprint into the scheduling vector, and in round r+1 use the assigned message slot only if their footprint round-tripped unchanged.

Implementation shape: add a second RoundParams::num_sched_slots and a second broadcast vector, run the same HKDF-AES pad derivation against a distinct label "zipnet/pad/sched/v1". The CommitteeMachine consumes two aggregates per round (message + schedule) and splits the final broadcast into two halves. WIRE_VERSION bump: 1 → 2.

Cover traffic

v1: non-talking clients omit their envelope entirely. This narrows the anonymity set to active talkers.

v2: clients with no message produce a pure-pad envelope (msg_i = 0, all pads XORed in). The aggregator and committee process these indistinguishably from talker envelopes. The only visible change at the state-machine level: participants grows to include cover traffic.

This is a tiny code change on the client (just remove the “skip when message == None” early return in client::seal) plus a policy decision on how often a client should send cover. Stay-cheap- on-the-server was a first-class design goal of the paper; v2 makes it concrete.

Ratcheting for forward secrecy

v1: every round reruns HKDF-Extract from the same shared_secret. Compromise of the secret compromises all past pads.

v2: at the end of each round, both client and server ratchet:

shared_secret ← HKDF-Extract("zipnet/ratchet/v1", shared_secret);

Past shared secrets are unrecoverable from the new one under the PRF assumption. Both sides must step the ratchet in lockstep; the round number acts as the step counter. Committee members rederiving a missed step for a late-joining client catch up by evaluating the KDF round times.

For the client, the ratchet state sits in the TEE’s sealed storage (v2 TDX path). For the mock client, it sits in RAM — so a restart re-derives an independent key tree, which is fine.

Multi-tier aggregators

v1: single aggregator.

v2: arbitrary rooted tree of aggregators. Each leaf-level aggregator XOR-folds from its assigned clients, pushes up to its parent, parent folds and pushes to root, root publishes to the committee. Filtering uses require(|p| peer.tags().contains(&tag!("aggregator.tierN"))) and with_tags("aggregator.tierN+1") on online_when.

Each aggregator-to-aggregator link uses a dedicated stream (we already have the pattern in AggregateToServers). No state-machine change required because the root aggregator still emits one AggregateEnvelope per round.

Liveness resilience

v1: any committee server being offline halts round finalization — the state machine waits for len(partials) == len(header.servers).

v2 options:

  • Relaxed finalization. Finalize after t-of-n partials, where t is a configured threshold. A missing server’s pads are retroactively removed via a published “apology partial” submitted by any honest server that knows the remaining clients’ pads. (This requires publishing the missing server’s pad seeds under the committee’s shared secret, which defeats the point — so it needs MPC.)

  • Aggregator-sponsored timeout. The leader signals a timeout, bumps the RoundId, and opens a fresh round without the stuck server’s pads. This is simpler but loses the anonymity contribution of the absent honest server.

The first option is research-complete but not engineering-complete; the second option is trivial and is the candidate for v2.

TDX attestation in the critical path

v1: tee-tdx feature exists but the committee accepts any peer with a well-formed ClientBundle ticket (our BundleValidator only checks id/dh_pub consistency).

v2: on each committee admission path add .require_ticket(Tdx::new() .require_mrtd(expected_mrtd)) so only enclave-verified peers can participate. The expected MR_TD comes from the reproducible image build. ClientRegistry writes only land if the bundle’s PeerEntry also carries a valid TDX quote.

This is additive to the existing BundleValidator and stacks cleanly thanks to mosaik’s multi-require_ticket support.

State archival and snapshot sync

v1: CommitteeMachine.broadcasts grows unbounded in RAM; LogReplaySync is used for catch-up.

v2: implement a StateSync strategy that snapshots the last N broadcasts + the current InFlight and emits a blob. Externalize the archival of rotated broadcasts to a sink collection or a replicated object store.

Rate-limiting tags

v1: absent. A malicious client can flood envelopes.

v2: per the paper’s §3.1 sketch, each envelope carries PRF_k(ctr || epoch) where ctr is attested by the enclave. The aggregator dedupes by tag per epoch. This requires the TEE path to have landed first.

Scheduling vector equivocation protection

v1: a single leader publishes LiveRound into LiveRoundCell; divergent schedules would be detectable via the schedule_hash input to the KDF (if we included it — we pass NO_SCHEDULE in v1). Once footprint scheduling lands, every client must derive schedule_hash from the same broadcast schedule as the committee, or pads disagree and the broadcast is noise (correct failure mode per paper §3.2).

Versioning under stable instance names

v1: every incompatible change (any WIRE_VERSION or signature() bump) produces a new GroupId. Under the UNIVERSE + instance-salt design described in design-intro, this effectively makes the old instance a ghost and forces consumers to re-pin. If "acme.mainnet" is meant to be an operator-level identity that outlives schema changes, v1 cannot deliver it.

v2 must pick one of two reconciliation strategies, documented in design-intro — Versioning under stable instance names:

  • Version-in-name. acme.mainnet-v2 retires acme.mainnet. Clean, but forces a consumer-side release per bump.
  • Lockstep releases. The instance name stays stable across versions and operators + consumers cut matching releases against a shared deployment crate. Avoids id churn at the cost of tighter release-cadence coupling.

Neither is chosen yet. The call is forced the first time a v2 milestone above lands in a production deployment.

Cross-service composition

v1: zipnet is the only service we ship on zipnet::UNIVERSE.

v2: as sibling services (multisig signer, secure storage, attested oracles) land on the same universe, two concerns surface:

  • Catalog noise. Every peer on the universe appears in every agent’s discovery catalog. /mosaik/announce volume scales with the universe, not with the services an agent cares about. The escape hatch is the per-service derived private network for high-churn internal chatter; the residual cost is paid by everyone. If a service’s traffic would dominate the shared network, it belongs behind its own NetworkId — Shape A in design-intro — Two axes of choice — not on the shared one.
  • Cross-service atomicity. “Mix a zipnet message AND rotate a multisig signer” cannot be a single consensus transaction; they are different Groups, possibly with disjoint membership. If a coordination-heavy use case genuinely needs that, the answer is a fourth primitive that is itself a deployment providing atomic composition — not an ad-hoc cross-group protocol.

Optional directory collection (devops convenience)

Not a core feature. Zipnet’s consumer binding path is compile- time name reference plus mosaik peer discovery; no on-network registry is required, and the CLAUDE.md commitment is explicit that one will not be added. However, a shared Map<InstanceName, InstanceCard> listing known deployments may ship as a devops convenience for humans enumerating instances across operators. If built, it must:

  • be documented as a convenience, not a binding path;
  • be independently bindable — the SDK never consults it;
  • not become load-bearing for ACL or attestation decisions.

Flag-in-source as // CONVENIENCE: if it lands, to distinguish it from the // SIMPLIFICATION: v2-deferred markers.

Migration across these milestones

Each milestone above changes WIRE_VERSION or at minimum CommitteeMachine::signature(). Rolling between v1 and an arbitrary v2 milestone is therefore a coordinated “stop all nodes, start with new config” operation — same procedure as rotating the committee secret. We make no attempt at on-the-fly upgrade paths in this prototype.

Extending zipnet

audience: contributors

This chapter covers two kinds of extension:

  1. Extending zipnet itself — new commands, collections, streams, ticket classes, or round-parameter knobs within a zipnet deployment.
  2. Building an adjacent service on the shared universe — a new mosaik-native service (multisig signer, secure storage, attested oracle, …) that coexists with zipnet on zipnet::UNIVERSE and reuses the instance-salt pattern.

The second is the generalisation of the first. The “checklist for a new service” at the end of design-intro is the canonical reference for the second kind; this chapter links to it and concentrates on the concrete how-tos.

Extending zipnet itself

Adding a new command to the committee state machine

  1. Add a variant to Command in crates/zipnet-node/src/committee.rs.
  2. Handle it in apply(). Deterministic only — no I/O, no randomness that isn’t derived from ApplyContext (see Committee state machine — Apply-context usage).
  3. Bump the version tag in CommitteeMachine::signature() (v1v2). This re-scopes the GroupId so mismatched nodes cannot bond. This is a breaking change.
  4. Add a Query variant if the new state needs external read access.
  5. Decide who issues the command. If a non-server peer needs to trigger it, add a declare::stream! channel and a side-task in roles::server that feeds it into group.execute.

Adding a new collection

  1. Declare in crates/zipnet-node/src/protocol.rs:

    declare::collection!(
        pub MyMap = mosaik::collections::Map<K, V>,
        "zipnet.collection.my-map",
    );
  2. Decide writer and reader roles. Writers join the collection’s internal Raft group and bear the leadership election cost.

  3. For TDX-gated collections, compose Tdx::new().require_mrtd(...) onto the collection’s require_ticket alongside the existing BundleValidator — see Mosaik integration — TDX gating.

  4. If the new collection is part of the public surface, think twice. Zipnet’s declared public surface is small (write-side + read-side, see Architecture). A new public collection widens the consumer contract; prefer surfacing via Zipnet::bind instead of growing raw declarations.

  5. Once the target per-instance layout lands, the literal string will be replaced by INSTANCE.derive("my-map"); structure the name so the migration is a pure rename.

Adding a new typed stream

  1. Declare in protocol.rs. Prefix predicates with producer / consumer per the direction semantics (Mosaik integration — predicate direction).
  2. Use in a role module: MyStream::producer(&network) / MyStream::consumer(&network) returns concrete typed handles.
  3. If this is a high-churn internal channel (aggregator fan-in, DH gossip), it’s a candidate to live on a derived private network rather than the shared universe — see Architecture — Internal plumbing.

Adding a new TicketValidator

  1. Implement mosaik::tickets::TicketValidator on a fresh type. BundleValidator<K> in crates/zipnet-node/src/tickets.rs is the reference shape.

  2. Pick a TicketClass constant. Keep it human-readable ("zipnet.bundle.server", etc.) — ticket classes are intent-addressed and the string is the intent.

  3. Fold a version tag into signature() the same way BundleValidator does:

    fn signature(&self) -> UniqueId {
        K::CLASS.derive("zipnet.my-validator.v1")
    }

    Bumping v1v2 re-scopes the GroupId of every group that stacks this validator. Treat it as a breaking change.

  4. Compose with existing validators via mosaik’s multi- require_ticket — see Mosaik integration — TDX gating for the stacking pattern.

Changing RoundParams

  1. Edit RoundParams::default_v1() in crates/zipnet-proto/src/params.rs.
  2. Bump WIRE_VERSION if the change is semantically meaningful (any client/server disagreement on shape would garble pads otherwise).
  3. CommitteeMachine::signature() already mixes in params fields; every member rederives GroupId and old + new do not bond.
  4. Deploy-time coordination: same procedure as rotating the committee secret.

Adding a TDX attestation requirement

  1. Turn on the tee-tdx feature on zipnet-node, zipnet-server, zipnet-client.

  2. In the deployment-specific main, pre-compute (or hardcode) the expected MR_TD.

  3. Build a validator:

    use mosaik::tickets::Tdx;
    let validator = Tdx::new().require_mrtd(expected_mrtd);
  4. Plumb validator into the server’s run path by stacking it on the committee GroupBuilder::require_ticket and on each collection / stream whose producer you want to TDX-gate.

Swapping the slot assignment function

  1. The slot is picked by zipnet_core::slot::slot_for(client, round, params). Change the body; the caller contract is -> usize.
  2. If you want the footprint scheduling variant, you’ll also want a per-round side channel — see Roadmap — Footprint scheduling.
  3. Deterministic and agreed upon by all nodes. Bump the protocol version tags accordingly.

Running the integration test under heavier parameters

crates/zipnet-node/tests/e2e.rs uses RoundParams::default_v1() and a hardcoded 3-server / 2-client topology. Modify directly; the helpers (cross_sync, run_server, run_client, run_aggregator) are scoped to the test so no cross-cutting refactor is needed.

RUST_LOG=info,zipnet_node=debug cargo test -p zipnet-node --test e2e -- --nocapture

A successful run ends with

zipnet e2e: round r1 finalized with 2/2 messages recovered

Where to put a new role

If you introduce a fourth participant type (say, an “auditor” that archives Broadcasts to cold storage), the idiomatic placement is a new module in crates/zipnet-node/src/roles/ and a sibling crate under crates/zipnet-auditor/ that delegates to it. Follow the zipnet-aggregator binary layout.

Measuring something

Mosaik’s Prometheus metrics are auto-wired; add your own via the metrics crate:

use metrics::{counter, gauge};

counter!("zipnet_rounds_opened_total").increment(1);
gauge!("zipnet_client_registry_size").set(registry.len() as f64);

They will appear at the configured ZIPNET_METRICS endpoint without any scraper-side changes.

Building an adjacent service on the shared universe

Zipnet’s deployment model is a reusable pattern — the full rationale is in design-intro. Any service that wants to coexist on zipnet::UNIVERSE alongside zipnet should reproduce the three conventions:

  1. Instance-salt discipline. Every public id descends from blake3("yourservice." + instance_name). Provide both a compile-time macro and a runtime fn that produce byte-identical output.
  2. A Deployment-shaped convention. Declare the public surface (one or two primitives, ideally) in a single protocol module; export a bind(&Network, instance_name) -> TypedHandles function.
  3. A naming convention, not a registry. Operator → consumer handshake is universe NetworkId + instance name + (if TDX-gated) MR_TD. No on-network advertisement required — mosaik’s standard discovery bonds the sides.

Walk the checklist for a new service end-to-end before writing any code. The most common mistake is not answering “what happens when StateMachine::signature() bumps?” before shipping.

When Shape B is the wrong call

A service whose traffic would dominate catalog gossip on the shared universe (high-frequency metric streams, bulk replication) belongs behind its own NetworkId — Shape A in design-intro — Two axes of choice. The narrow-public-surface discipline does not rescue a service whose steady-state traffic is inherently loud; at that point the noise cost dominates the composition benefit.

Optional directory collection

If your operator community wants a human-browsable list of known deployments, ship a sibling Map<InstanceName, InstanceCard> as a devops convenience, not as part of the consumer binding path. See Roadmap — Optional directory collection for the discipline.

Glossary

audience: all

Domain terms as they are used in this book and in the source.

Aggregator. The untrusted node that XOR-folds client envelopes for a round into a single AggregateEnvelope and forwards it to the committee. One aggregator in v1; a tree of aggregators in v2. Runs inside zipnet-aggregator.

Any-trust. Security assumption where anonymity holds as long as at least one party in a designated set is honest. The zipnet committee is an any-trust set.

bind. The Zipnet::bind(&Arc<Network>, &str) constructor — the single public path from a mosaik network handle to a typed zipnet handle. Takes an instance name; derives every instance-local ID internally; returns a Zipnet that exposes publish / subscribe / shutdown. See Quickstart — publish and read.

bind_by_id. The Zipnet::bind_by_id(&Arc<Network>, UniqueId) variant of bind, for consumers who have pre-derived the instance UniqueId at compile time via zipnet::instance_id!("name"). The macro and runtime instance_id fn produce identical bytes, so a compile-time bind_by_id and a string bind with the same name land on the same instance.

Bond. mosaik term for a persistent QUIC connection between two members of the same Raft group, authenticated by the shared GroupKey.

Broadcast vector. B = num_slots * slot_bytes bytes of output per round. Default 16 KiB. Each finalized round commits one broadcast vector to the Broadcasts collection.

Client. A node that authors messages and seals them into envelopes inside a TEE. In the mock path (v1 default), the TEE is replaced by a plain process; see Security checklist.

ClientBundle. Public pair (ClientId, dh_pub) gossiped via a discovery ticket so servers can derive per-client pads.

ClientId. 32-byte blake3-keyed hash of the client’s X25519 public key. Stable as long as the client’s DH secret is stable.

Committee. The set of any-trust servers that collectively unblind the round’s aggregate. In v1 this is a Raft group with a bespoke CommitteeMachine state machine. One committee per instance.

Cover traffic. Client envelopes carrying a zero message, sent to widen the anonymity set at negligible extra cost. The SDK sends cover envelopes by default when an instance is bound but idle. See Publishing messages.

DC net. Dining Cryptographers network — the XOR-based anonymous broadcast construction zipnet descends from. See Chaum 1988.

DH secret. An X25519 static secret held by a client or a server. Compromise of one party’s DH secret only affects that party; compromise of every committee server’s DH secret breaks anonymity.

Encrypted mempool. The canonical motivating deployment shape: TEE-attested wallets seal transactions and publish them through zipnet; builders read the ordered log of sealed transactions; no single party can link a transaction back to its sender. Zipnet supplies the anonymous publish channel; the encryption of the payload itself (threshold, TEE-unsealing, etc.) sits on top.

Envelope. A client’s per-round contribution: a broadcast-vector- sized buffer containing message ‖ tag at the client’s slot and zeros elsewhere, XORed with the sum of the client’s per-server pads.

Falsification tag. A keyed-blake3 output of the plaintext message, written alongside the message in the same slot. Verifies that a slot’s payload is intact (§3, “ROMHash” in the paper).

Fold. The aggregator’s XOR combine of all envelopes for a round.

Footprint scheduling. The paper’s two-channel slot reservation scheme (§3.2). v2 feature.

GroupId. mosaik’s 32-byte identifier for a Raft group, derived from the GroupKey, consensus config, state machine signature, and any TicketValidator signatures. Fully determined by the instance name plus the deployment crate version.

GroupKey. Shared committee secret. Admission gate for joining the committee’s Raft group.

Instance. A single zipnet deployment — one committee, one ACL, one set of round parameters — sharing a universe with other zipnet instances and other mosaik services. Operators stand up and retire instances; users bind to them by name.

Instance name. A short, stable, namespaced string that identifies an instance within a universe (e.g. acme.mainnet, preview.alpha, dev.ops). Folds deterministically into every instance-local ID. Flat namespace per universe — collisions are silent, so namespace defensively (<org>.<purpose>.<env>).

instance_id. Runtime function and macro on the zipnet facade that derive an instance’s root UniqueId from its name. zipnet::instance_id("acme.mainnet") and zipnet::instance_id!("acme.mainnet") produce identical bytes — both expand to blake3("zipnet.acme.mainnet"). Sub-IDs chain off it via .derive("submit" | "broadcasts" | "committee" | …).

LiveRound. The currently-open round’s header: round id, client roster snapshot, server roster snapshot.

mosaik. The Flashbots library on which this prototype is built. Provides discovery, typed streams, consensus groups, and replicated collections. See docs.mosaik.world.

MR_TD. 48-byte Intel TDX guest measurement. Published by the operator out of band; pinned by clients; enforced by the mosaik Tdx bonding layer. See TEE-gated deployments.

Pad. The output of the KDF for a given (client, server, round) triple; length B. XOR of pads is the DC-net’s one-time key.

Partial unblind. One committee server’s XOR of its per-client pads over the round’s participant set. XORing all partials into the aggregate yields the broadcast.

PeerId. mosaik identifier for a node: its ed25519 public key (via iroh). Different from ClientId / ServerId (which are DH-key-based).

Raft. The consensus protocol used by the committee group. mosaik uses a modified Raft with abstention votes.

Ratchet. Stepping the shared secret forward one round; shared_secret ← HKDF(shared_secret). Provides forward secrecy. v2 feature.

Round. One execution of the protocol: OpenRound → SubmitAggregate → N_S × SubmitPartial → finalize.

RoundId. Monotonically increasing integer; r0, r1, ....

RoundParams. Static shape of a round: num_slots, slot_bytes, tag_len, wire_version. Immutable for the lifetime of an instance.

ServerBundle. Public pair (ServerId, dh_pub) gossiped via a discovery ticket so clients can derive per-server pads.

ServerId. 32-byte blake3-keyed hash of a committee server’s X25519 public key.

Slot. One slot_bytes-byte region of the broadcast vector. One active client per slot per round (modulo deterministic collisions).

State machine signature. UniqueId mixed into GroupId derivation. Bumped whenever apply semantics or Command shape changes.

TEE. Trusted Execution Environment. Intel TDX in the production path; mock in the v1 default path.

TDX. Intel Trust Domain Extensions — the TEE zipnet targets. Guest measurement is MR_TD. See TEE-gated deployments.

Ticket. Opaque bytes attached to a signed PeerEntry in mosaik discovery. Zipnet uses tickets of classes zipnet.bundle.client and zipnet.bundle.server to distribute DH pubkeys, and relies on mosaik’s require_ticket for per-instance ACL on the public primitives.

Universe. The shared mosaik NetworkId on which zipnet (and any other mosaik service) runs. The zipnet facade exports the constant zipnet::UNIVERSE = unique_id!("mosaik.universe"). Many instances, and many unrelated services, coexist on one universe.

XOR. Exclusive-or over equal-length byte buffers. The DC-net’s fundamental operation.

Paper cross-reference

audience: contributors

Pointer table from the prototype’s source modules to the ZIPNet paper (eprint 2024/1227). Section / algorithm / figure numbers are from the camera-ready version. Crate paths are workspace-relative.

Paper itemPrototype location
§2.1 “Chaum’s DC net” (background)zipnet-proto::xor (crates/zipnet-proto/src/xor.rs)
§2.2 “ZIPNet overview” (Figure 1b)crates/zipnet-node/src/lib.rs diagram + Architecture
§3 “Falsifiable TEE assumption”zipnet-proto::crypto::falsification_tag (crates/zipnet-proto/src/crypto.rs)
§3 “Setup” (PKI, attestation, sealed key DB)zipnet-proto::keys + zipnet-node::tickets::BundleValidator (crates/zipnet-node/src/tickets.rs)
§3 “Sealed data”v2 sealed storage in TEE; not implemented in v1
§3.1 “Rate limiting tags”v2 item; not implemented
§3.2 “Scheduling” (footprint)v2 item; not implemented (see roadmap)
§3.3 “Adversary and network model”Threat model
§3.3 “Security argument”Threat model — anonymity sketch
Algorithm 1 (client seal)zipnet-core::client::seal (crates/zipnet-core/src/client.rs)
Algorithm 2 (aggregator fold)zipnet-core::aggregator::RoundFold (crates/zipnet-core/src/aggregator.rs)
Algorithm 3 (server partial + finalize)zipnet-core::server::partial_unblind + zipnet-core::server::finalize (crates/zipnet-core/src/server.rs)
Appendix A (anonymous broadcast definition)inherited — the prototype does not reprove it

Crate responsibilities

The workspace splits the paper’s constructions along a purity boundary (see Crate map):

CratePaper contentI/O?
zipnet-protoWire types, keys, XOR, falsification tag primitiveNo
zipnet-coreAlgorithms 1 / 2 / 3 as pure functions over zipnet-proto typesNo
zipnet-nodeThe mosaik integration — CommitteeMachine, role event loops, TicketValidatorYes
zipnet-server / zipnet-aggregator / zipnet-clientThin CLI wrappers around zipnet-node::roles::{server, aggregator, client}Yes
zipnetSDK facade (Zipnet::bind, UNIVERSE, instance_id!); wraps zipnet-node for external consumersYes

zipnet-proto and zipnet-core do not import mosaik or tokio; if a paper construction reaches for either, it is in the wrong crate.

Notation

The paper uses capital N (total users), N_S (servers), |m| (slot bytes), B (broadcast vector bytes). The prototype uses lowercase n / num_slots / slot_bytes / broadcast_bytes in code and generally follows the paper’s naming in comments.

Deliberate deviations from the paper

  • No schedule hash in v1. The paper mixes publishedSchedule into the KDF salt. The prototype passes a constant NO_SCHEDULE = [0u8; 32] in v1 and will replace it with the real schedule hash when footprint scheduling lands. Binding the schedule into the KDF is already plumbed (crypto::kdf_salt takes it as an argument), so the upgrade is a caller-site change.
  • Tag is keyed-blake3, not HMAC. The paper writes “ROMHash” informally; the prototype picks keyed-blake3 with a fixed domain-separating label for performance. Both are PRFs under standard assumptions; no security difference relative to the paper’s ROM-based argument.
  • No traitor tracing protocol. The paper’s §3 suggests that any malformed message flips hash bits and is detected with overwhelming probability. v1 only checks tags on observation; an adversarial client writing to an unused slot is visible via tag mismatch but not attributed. This matches the paper’s “falsifiable trust assumption” but does not implement the §3.1 rate-limiting PRF tags.
  • Anonymous broadcast channel for scheduling. The paper runs a second DC net for reservations (§3.2). v1 runs only the message channel.
  • Instance namespacing replaces paper-implicit single-deployment identity. The paper treats a ZIPNet committee as a single global entity. The prototype runs many instances side by side on a shared mosaik universe, each with its own salt (see Designing coexisting systems on mosaik). No paper construction is changed by this; every derivation folds instance_id in where the paper has an implicit single “deployment” constant.

Environment variables

audience: both

Variables that every binary respects, plus role-specific ones. All are optional unless marked Required. Values are passed either as an env var or as the corresponding CLI flag; env beats flag when both are set (per clap(env = "...")).

Users do not read this page — the SDK takes no env vars. This is an operator reference. When it diverges from what a binary currently parses, the binary is lagging the documented deployment model; align the binary to this page, not the other way around.

Common to every binary

VariableCLI flagDefaultDescription
ZIPNET_INSTANCE--instanceRequiredInstance name for this deployment (e.g. acme.mainnet). Folds into committee GroupId, submit StreamId, broadcasts StoreId. All processes of one deployment must share this value.
ZIPNET_UNIVERSE--universezipnet::UNIVERSE (mosaik.universe)Override the shared mosaik universe NetworkId. Set only for isolated federations; leave unset for normal deployments.
ZIPNET_BOOTSTRAP--bootstrap(none)Comma- or repeat-flag-separated PeerIds on the shared universe to dial on startup. Universe-level, not per-instance.
ZIPNET_METRICS--metrics(none)Prometheus exporter bind address, e.g. 0.0.0.0:9100.
ZIPNET_SECRET--secret(random)Seed for this node’s iroh secret. Anything not 64-hex is blake3-hashed. Recommended on committee servers and the aggregator for stable PeerId.
RUST_LOGinfo,zipnet_node=debugStandard tracing_subscriber filter.

zipnet-server

VariableCLI flagDefaultDescription
ZIPNET_COMMITTEE_SECRET--committee-secretRequiredShared committee admission secret. Treated as a root credential — all committee servers of the same instance must share this value; clients and the aggregator must not have it.
ZIPNET_MIN_PARTICIPANTS--min-participants1Minimum registered clients before the leader opens a round.
ZIPNET_ROUND_PERIOD--round-period2sHow often the leader attempts to open a new round.
ZIPNET_ROUND_DEADLINE--round-deadline6sHow long a round may stay open before the leader force-advances.

zipnet-aggregator

VariableCLI flagDefaultDescription
ZIPNET_FOLD_DEADLINE--fold-deadline2sTime window after a round opens in which the aggregator accepts envelopes.

zipnet-client

VariableCLI flagDefaultDescription
ZIPNET_MESSAGE--message(none)UTF-8 message to seal each round. Omit to run as cover traffic.
ZIPNET_CADENCE--cadence1Talk every Nth round (1 = every round).

Duration syntax

The duration parsers accept Nms, Ns, Nm (e.g. 500ms, 2s, 1m). Hours / days are not supported; if you need them, file an issue.

Secret syntax

All “secret” style inputs (ZIPNET_SECRET, ZIPNET_COMMITTEE_SECRET) follow the same rule:

  • Exactly 64 hex characters → decoded as 32 raw bytes.
  • Anything else → blake3-hashed into 32 bytes.

This matches mosaik’s own secret-key handling, so operators can reuse whatever seed format they already have (e.g. openssl rand -hex 32).

Instance derivation

Every instance-local ID is derived from ZIPNET_INSTANCE:

INSTANCE   = blake3("zipnet." + ZIPNET_INSTANCE)     // UniqueId
SUBMIT     = INSTANCE.derive("submit")               // StreamId
BROADCASTS = INSTANCE.derive("broadcasts")           // StoreId
COMMITTEE  = INSTANCE.derive("committee")            // GroupKey material
...

The consumer-side zipnet::instance_id!("name") macro produces the same bytes as the server-side ZIPNET_INSTANCE=name derivation, so a typo on either side lands on a GroupId nobody serves. The failure mode is Error::ConnectTimeout on the client, not a distinct “not found” error — zipnet has no on-network registry.

Two deployments with different ZIPNET_INSTANCE values on the same universe are completely independent committees: disjoint GroupIds, disjoint streams, no crosstalk. Useful for:

  • running dev/staging/prod in one machine pool,
  • running per-tenant deployments on shared hardware,
  • running a public testnet (preview.alpha) alongside production (mainnet).

Instance names share a flat namespace per universe — two operators picking the same name collide in the committee group and neither works correctly. Namespace defensively (<org>.<purpose>.<env>, e.g. acme.mixer.mainnet).

Universe override (ZIPNET_UNIVERSE)

Default is the shared mosaik universe (zipnet::UNIVERSE = unique_id!("mosaik.universe")). Override only when running an isolated federation that intentionally does not share peers with the rest of the mosaik ecosystem. Every server, aggregator, and client of one deployment must agree on this value; consumers of the SDK build against zipnet::UNIVERSE unless their code explicitly passes a different NetworkId to Network::new.

Metrics reference

audience: operators

Every zipnet binary exposes a Prometheus endpoint when ZIPNET_METRICS is set. The table below lists the metrics worth scraping in production. Metrics starting with mosaik_ are emitted by the underlying mosaik library and documented in the mosaik book — Metrics; the ones that are load-bearing for zipnet operations are listed here.

Metrics that are instance-scoped carry an instance label whose value is the operator’s ZIPNET_INSTANCE string (e.g. acme.mainnet). When a host multiplexes several instances (see Operator quickstart — running many instances), every instance-scoped metric is emitted once per instance.

Per-role metrics

Committee server

MetricKindMeaningHealthy value
mosaik_groups_leader_is_local{instance=<name>}gauge (0/1)Whether this node is the Raft leader for the instanceExactly one 1 across the committee of each instance
mosaik_groups_bonds{peer=<id>,instance=<name>}gauge (0/1)Whether a bond to a specific peer is healthy1 for every other committee member of the same instance
mosaik_groups_committed_index{instance=<name>}gaugeHighest committed Raft indexMonotonically increasing, step ≈ 2 per round
zipnet_rounds_finalized_total{instance=<name>}counterRounds this node saw finalizeIncreases at ~1 / ZIPNET_ROUND_PERIOD
zipnet_partials_submitted_total{instance=<name>}counterPartials this node contributedIncreases 1-per-round
zipnet_client_registry_size{instance=<name>}gaugeClients currently registeredRoughly = expected client count
zipnet_server_registry_size{instance=<name>}gaugeServers currently registeredEquals committee size

The mosaik_groups_leader_is_local gauge is the one the operator quickstart tells you to check when bringing a new instance up — exactly one committee node should report 1 per instance.

Aggregator

MetricKindMeaningHealthy value
mosaik_streams_consumer_subscribed_producers{stream=<id>,instance=<name>}gaugeNumber of producers this consumer is attached to= client count for ClientToAggregator
mosaik_streams_producer_subscribed_consumers{stream=<id>,instance=<name>}gaugeNumber of consumers attached to this producer= committee size for AggregateToServers
zipnet_aggregates_forwarded_total{instance=<name>}counterAggregates sent to the committee≈ rounds finalized
zipnet_fold_participants{round=<r>,instance=<name>}histogramClients per folded roundDepends on your client count
zipnet_clients_registered_total{instance=<name>}counterClient bundles mirrored into ClientRegistryGrows to client count, then plateaus

Client

MetricKindMeaningHealthy value
zipnet_envelopes_sent_total{instance=<name>}counterEnvelopes sealed and pushedIncreases by 1 per talk round
zipnet_envelope_send_errors_total{instance=<name>}countersend failuresIdeally 0
zipnet_client_registered{instance=<name>}gauge (0/1)Whether our bundle is in ClientRegistry1 after the first few seconds

Metrics that indicate trouble

MetricFires whenFirst action
mosaik_groups_leader_is_local is 1 on zero or ≥ 2 nodes of one instance for > 1 minSplit-brain or no leaderIncident response — split-brain
mosaik_streams_consumer_subscribed_producers drops to 0 on the aggregatorClients disconnectedCheck client-side logs for bootstrap failures
zipnet_aggregates_forwarded_total flat for > 3 × ZIPNET_ROUND_PERIODAggregator stuck OR committee cannot open roundsIncident response — stuck rounds
zipnet_server_registry_size < committee_size for > 30 sA committee server failed to publishCheck that server’s boot log
mosaik_groups_committed_index frozenRaft stalledCheck clock skew, network partition

Every trouble alert should be scoped by instance so multi-instance hosts do not conflate a stuck testnet with a stuck production committee.

Recording rules for Prometheus

Useful derived series (all scoped by instance):

# Round cadence per instance
rate(zipnet_rounds_finalized_total[5m])

# Average participants per round per instance
  rate(zipnet_fold_participants_sum[5m])
/ rate(zipnet_fold_participants_count[5m])

# Aggregator fold saturation (clients dropped by the deadline)
(
  rate(zipnet_clients_registered_total[5m])
  -
  rate(zipnet_fold_participants_sum[5m]) / rate(zipnet_rounds_finalized_total[5m])
)

Logs that should never fire (without a concurrent alert)

  • rival group leader detected on any committee server.
  • SubmitAggregate with bad length / SubmitPartial with bad length in a committee log.
  • failed to mirror LiveRoundCell persistently.
  • committee offline — aggregate dropped — either the committee is down or bundle tickets never replicated.

If any of these fire without a concurrent incident, treat it as a protocol invariant break and escalate to the contributor on-call.