Troubleshooting from the user side
audience: users
Failure modes you can observe from your own agent, mapped to the SDK’s error enum and the fastest check for each.
The error enum
pub enum Error {
WrongUniverse { expected: mosaik::NetworkId, actual: mosaik::NetworkId },
ConnectTimeout,
Attestation(String),
Shutdown,
Protocol(String),
}
Five variants. The two you will hit most in development are
ConnectTimeout and WrongUniverse. Everything else is either a
real runtime condition or lower-level plumbing surfaced through
Protocol.
Symptom: a Zipnet::<D>::* constructor returns ConnectTimeout
This is the single most common dev-time error. It means the SDK could not bond to a peer serving your deployment within the connect deadline. In descending order of likelihood:
1. Config mismatch with the operator
Every field in your Config — name, window, init — plus your
datum’s TYPE_TAG and WIRE_SIZE folds into the deployment’s
on-wire identity. A one-character difference in the name, a
different ShuffleWindow preset, or a stale init salt produces a
completely different id and nobody is serving it.
Fix: double-check every field against the operator’s handoff.
Prefer pinning the Config as a compile-time constant so the
fingerprint is immutable at source:
use zipnet::{Config, ShuffleWindow, Zipnet};
const ACME_MAINNET: Config = Config::new("acme.mainnet")
.with_window(ShuffleWindow::interactive())
.with_init([
0x7f, 0x3a, 0x9b, 0x1c, /* … operator-published bytes … */ 0x00,
]);
// Print the derived id on both sides to confirm agreement.
println!("{}", Zipnet::<Note>::deployment_id(&ACME_MAINNET));
2. Operator’s committee isn’t up
The fingerprint is right, but nobody is currently serving it. The
SDK cannot distinguish “nobody serves this” from “operator isn’t up
yet” without an on-network registry — both surface as
ConnectTimeout.
Fix: ask the operator whether the deployment is live.
3. Bootstrap peers unreachable
Even if the instance name is right and the committee is up, your network never bonded to the universe — so it never found the committee. Usually shows up alongside no peer-catalog growth.
Fix: check the bootstrap peer list. See Connecting — Cold-start checklist.
4. TDX posture mismatch
Silent rejection at the bond layer from a TDX-gated deployment
often looks like ConnectTimeout rather than a clear Attestation
error. Common when your client is built without the tee-tdx
feature against a TDX-gated operator.
Fix: see TEE-gated deployments.
Symptom: a Zipnet::<D>::* constructor returns WrongUniverse
Your Arc<Network> was built against a different NetworkId than
zipnet::UNIVERSE. The error payload tells you both values:
match zipnet::Zipnet::<Note>::submit(&network, &ACME_MAINNET).await {
Err(zipnet::Error::WrongUniverse { expected, actual }) => {
tracing::error!(%expected, %actual, "network on wrong universe");
}
_ => {}
}
Fix: build the network with Network::new(UNIVERSE) or
Network::builder(UNIVERSE). There is no way to tunnel zipnet over
a non-universe network.
Symptom: a Zipnet::<D>::* constructor returns Attestation
TDX attestation failed. The string payload names the specific failure from the mosaik TDX stack.
Common causes:
- You built with
tee-tdxbut aren’t running inside a TDX guest. - Your MR_TD differs from the operator’s expected value (fresh image you haven’t rebuilt, or operator rotated).
- Your quote has expired.
Symptom: a receipt’s Outcome is Collided
Another client hashed to the same slot this round. Both payloads get XOR-corrupted; no observable message lands for either of you.
Fix: retry on the next round. See Publishing — Retry policy.
Persistent collisions are a signal that the deployment is oversubscribed
for its num_slots — an operator-side tuning problem, not a user one.
Collision probability per pair per round is 1 / num_slots; for N
clients the expected number of collisions per round is
C(N, 2) / num_slots.
Symptom: a receipt’s Outcome is Dropped
The aggregator never forwarded your envelope into a committed aggregate. Usually transient:
- Aggregator was offline that round.
- Your registration hadn’t propagated yet (first few seconds after opening the submitter).
Fix: retry. Repeated Dropped across many rounds means the
aggregator is unreachable from you — check the peer catalog and
bootstrap peers, then contact the operator.
Symptom: the reader sees no values for a long time
Two possibilities:
1. The committee is stuck
The cluster is not finalizing rounds. Contact the operator.
2. Your handle hasn’t caught up yet
The Zipnet::<D>::* constructors wait for the first live round
roster before returning, so once you hold a Reader<D> values
should start at the next round boundary. If they do not, you are
not reaching the broadcast collection’s group — same checks as for
ConnectTimeout (config, bootstrap, UDP egress, TDX).
Symptom: send or stream next returns Shutdown
The handle is closing. Either you dropped every clone of the
submitter, dropped the reader/receipts stream, or the underlying
Network went down.
Fix: check that the Arc<Network> is still alive and that no
other part of your code dropped the handle. If this is intentional,
the error is just the post-drop signal.
Symptom: Error::Protocol(…) with an opaque string
The SDK bubbled up a lower-level mosaik or zipnet-protocol failure. The string content is for humans — do not pattern-match on it.
Fix: enable verbose logging and inspect the mosaik-layer event stream:
RUST_LOG=info,zipnet=debug,mosaik=info cargo run
If the root cause is in mosaik, the mosaik book has better diagnostics than this page can. Open a zipnet issue with the log excerpt if the failure looks zipnet-specific.
Symptom: reader lags and misses values
Your per-item handler is slower than the deployment’s round cadence. Internal broadcast channels drop items rather than stall the SDK.
Fix: offload heavy per-item work to a separate task. See Reading — Handling a slow consumer.
Symptom: my client compiled against one version, the operator upgraded
Mosaik pinned to =0.3.17 on both sides; zipnet and zipnet-proto
baselines must also match the deployment. If WIRE_VERSION or
round-parameter defaults change, your client derives a different
deployment id and the Zipnet::<D>::* constructors return
ConnectTimeout.
Fix: keep your zipnet dep version aligned with the operator’s release notes. Mosaik stays pinned.
When to escalate to the operator
- A
Zipnet::<D>::*constructor consistently fails withConnectTimeoutafter the config, bootstrap, and universe have all been verified. - Receipts keep returning
Outcome::Droppedacross many rounds. - Your reader stays open but sees no values over several round periods.
When you escalate, include:
- Your mosaik version (
=0.3.17) and zipnet SDK version. - The full
Config(name, window,init) and theZipnet::<D>::deployment_id(&CONFIG)you derive locally. - Whether you built with
tee-tdxand, if so, your client’s MR_TD. - A 60-second log excerpt at
RUST_LOG=info,zipnet=debug,mosaik=info.