What “peer to peer” means here, and why I keep mentioning it

I keep saying XES is peer to peer, and I keep getting questions about what that actually means. So here it is in plain English.

When you make a call on XES, the audio travels directly between your browser and the other person’s browser. Encrypted, end to end. There’s no XES server in the middle through which your voice flows. That’s peer to peer (or P2P, or “mesh” in the version with more than two people).

Compare that to how most chat services work. Discord, Zoom, AirTalk — the audio is sent up to a central server, the server mixes or forwards it, and it’s delivered back down to the other side. That central server is called an SFU (selective forwarding unit) or sometimes an MCU. It has its uses: better quality for big group calls, easier moderation, easier recording. The downside is that whoever runs the SFU can technically see, store, or share whatever flows through it. They mostly don’t. But they can.

P2P removes that “but they can.” If our server doesn’t carry the audio, our server can’t do anything with it. There’s no recording, no transcription, no live monitoring, no “just in case” storage. The architectural property is what makes the privacy claim real.

A couple of caveats, since I’m being honest about it.

The connection has to be set up somehow. The XES server does the matchmaking and exchanges the signalling messages that let the two browsers find each other. Once the call starts, the server steps out of the audio path.

About 20% of users can’t establish a direct P2P connection because their network is hostile to it — restrictive corporate firewalls, certain mobile carriers, double-NAT setups. For those cases there’s a fallback called TURN, which is a relay that forwards encrypted media without being able to read it. XES uses TURN relays for that fallback. They forward the encrypted traffic without decrypting it; the audio is still end-to-end between the two browsers, just bouncing through a relay to get there.

Group rooms (more than two people) work as a mesh by default — every participant connects directly to every other. This caps at about eight participants because the bandwidth gets too high beyond that. For now, XES doesn’t use an SFU even for groups. If we ever do, it’ll be opt-in and labelled, not a quiet upgrade.

That’s the whole P2P story. The reason it matters is that every other privacy claim XES makes rests on it.

← Back to Blog