Running the aggregator
audience: operators
The aggregator receives every client envelope for the live round,
XORs them into a single AggregateEnvelope, and forwards that to the
committee. It is untrusted for anonymity — compromising it only
affects liveness and round-membership accounting, never whether a
message can be linked to its sender. It is trusted for liveness:
if it stops, rounds stop.
In v1 there is exactly one aggregator per instance. It does not need to run inside a TDX guest (though you can if your ops story prefers uniformity).
One-shot command
ZIPNET_INSTANCE="acme.mainnet" \
ZIPNET_SECRET="stable-agg-seed" \
ZIPNET_FOLD_DEADLINE=2s \
./zipnet-aggregator --bootstrap <peer_id_of_a_committee_server>
Environment variables
| Variable | Meaning | Notes |
|---|---|---|
ZIPNET_INSTANCE | Instance name this aggregator serves | Required. Must match the committee’s. Typos show up as ConnectTimeout at round-open time. |
ZIPNET_UNIVERSE | Universe override | Optional; leave unset to use the shared universe. |
ZIPNET_SECRET (or --secret) | Seed for this aggregator’s stable PeerId | Strongly recommended: clients often use the aggregator as a discovery bootstrap. |
ZIPNET_BOOTSTRAP | Peer IDs to dial on startup | At least one committee server on a cold universe. |
ZIPNET_FOLD_DEADLINE | Time window to collect envelopes after a round opens | Default 2s. Raising it admits slower clients at the cost of latency. |
ZIPNET_METRICS | Prometheus bind address | Optional. |
The aggregator does not take ZIPNET_COMMITTEE_SECRET. It is
outside the committee’s trust boundary by design; do not give it
that secret even if your secret store makes it convenient.
What a healthy aggregator log looks like
INFO zipnet_node::roles::common: zipnet up: network=<universe> instance=acme.mainnet peer=4c210e8340... role=5ef6c4ada2...
INFO zipnet_node::roles::aggregator: aggregator booting; waiting for collections to come online
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r1 participants=3
INFO zipnet_node::roles::aggregator: aggregator: forwarded aggregate to committee round=r2 participants=3
...
Capacity planning
Per round the aggregator:
- Receives N × B bytes from clients, where N is the number of active clients and B is the broadcast vector size (defaults to 16 KiB).
- Sends one aggregate of size B to every committee server.
If the committee is 5 servers and the instance has 1000 clients with default parameters:
- Inbound per round ≈ 1000 × 16 KiB = 16 MiB.
- Outbound per round ≈ 5 × 16 KiB = 80 KiB.
At a 2 s round cadence, inbound averages 64 Mbit/s. Provision accordingly.
Graceful shutdown
SIGTERM. Clients whose envelopes had not yet been folded into the
current round’s aggregate will drop to the floor; they retry on the
next round automatically.
Because the aggregator is a single point of failure for liveness in
v1, plan restarts against your monitoring: a round stall of
3 × ROUND_PERIOD + ROUND_DEADLINE triggers the stuck-round alert
documented in Monitoring.
What if I want two aggregators?
Not supported in v1. Running two on the same instance name gets you two processes competing for the submit stream, not load-balancing. If you need redundancy today, fail over with a warm-standby host behind a process supervisor — not two live aggregators. A multi-tier aggregator tree is sketched in Roadmap to v2 — Multi-tier aggregators.
See also
- Running a committee server
- Incident response — aggregator crash-loop, OOM, partition handling.
- Monitoring and alerts — aggregator-relevant metrics.