Why voice and not text

The intuition that text is the “safer” option for a first conversation with a stranger is wrong, and it’s wrong in an interesting way.

Text feels safer because you can edit it. You can compose, delete, rewrite. You can think for as long as you want before pressing send. That illusion of control is what makes it feel like the lower-risk choice. The hidden cost is that the same editability creates the conditions for spiralling. You write a message. You delete it. You re-read what they sent. You write another one. You delete that too. By the time you’d have just spoken on a voice call, you’ve created an entire imagined conversation in your head about whether you sound stupid.

Voice removes that loop. You say a sentence. They hear it. They reply. The conversation moves whether you wanted it to or not. Awkward pauses last seconds, not days. Tone carries warmth or boredom or surprise immediately, so you know how the other person feels about what you said without having to guess from a delayed reply.

The other thing text strips out is the bit where you can hear someone laugh. The way someone laughs is often the entire reason a call goes well. You can’t type a laugh in a way that lands. “haha” reads as polite. A real laugh from the person on the other end is the most reliable signal in human conversation that the thing you said was actually funny, or that they liked you, or both.

None of this means text is useless. XES keeps typed messages inside calls for links, quick notes, or moments when saying a thing out loud is awkward. But the match itself is voice-first, because voice carries warmth and pace that text has to work hard to replace.

← Back to Blog