How we moderate

Moderation is the part of running a chat platform that nobody quite knows how to do well. Every approach has a major failure mode. Mods who watch live calls catch the worst behaviour and invade everyone’s privacy in the process. Pure community-flagging works fine until the bad actors learn to outpace the flaggers. AI classifiers catch the obvious and miss everything subtle, and to work at all they need to see the content, which puts you back in the privacy hole. There’s no consensus answer to this, in the industry or anywhere else. Everyone’s picking the failure mode they think they can live with.

XES’s pick is community-driven and structural, and it exists because of a specific technical constraint: the audio doesn’t pass through our servers. We couldn’t monitor live audio if we wanted to; there’s nothing on our end to listen to. That choice is what makes the privacy story real, and it forces the moderation system to be a different shape from what most platforms do.

Here’s the actual shape.

After every call, both sides get a rate prompt. Thumbs up, thumbs down, block, or skip. Each input feeds the other person’s trust score. The math is rater-weighted: a downvote from someone with a high trust score counts for more than a downvote from someone who joined yesterday and downvoted everyone they spoke to. This is the bit that stops the system being gamed by anyone organising a brigade.

Trust scores feed back into the queue. Users below a threshold sit in a cooldown for a stretch before they can re-queue. The cooldowns get longer the further you drop. By the time you’ve collected enough downvotes to fall below the bottom band, you’re locked out for 24 hours each attempt. A determined bad actor can come back as a fresh account, but a fresh account starts at the default trust score and gets weighted accordingly, and the same cycle starts again from zero.

The moderator queue catches what the rating system doesn’t. When a downvote or block is filed with a reason — harassment, racism, sexual content, suspected under-eighteen — or with free text, it lands on a dashboard. The current team is about three of us. We act on most reports within a day. Underage reports get same-day attention, since that one carries actual legal weight.

The thing this system doesn’t try to do is moderate the content of any single call. We can’t. There’s no audio for us to read. The aggregate effect is a platform that tilts statistically toward better calls. Bad behaviour gets expensive for the bad actor and stays effectively free for everyone else.

Two things that work without a moderator

Two of the more useful anti-abuse mechanisms on the site don’t involve a person at all. The first is that the gender filter is earned rather than given. A brand-new account can’t filter by gender; it opens up after a few good conversations. The single most common way one of these sites turns nasty is people signing up purely to skip past everyone until they hit a particular kind of person, usually women, and a filter you have to earn takes the fuel out of that before it can start. By the time someone could abuse the filter, they’ve had to behave well enough to earn it — and the rating system has had a few calls’ worth of chances to flag them if they didn’t.

The second is that rapid skipping slows you down. End call after call within a few seconds each and your re-queue starts dragging, escalating up to a few minutes; have one genuine conversation and it resets. Skipping through people that fast is the behaviour that makes the person on the other end feel like livestock going past on a belt, and making it slow and tedious removes most of the incentive without banning anyone. Both of these run automatically, cost nothing to operate, and scale perfectly, because they’re rules in the matchmaker rather than work for a person to do.

Where this bends at scale

The honest part of any moderation post is what fails when the user count goes up. Here’s what I’d expect.

The moderator team has to scale with reports, not users. If the site quadruples, the report volume probably triples (some friction reduction from word-of-mouth growth, lower-trust user mix), and the team has to keep response times in the same ballpark. The rough ratio I’d plan around is one moderator per couple of thousand active daily users, with the trust system carrying everything else. That holds because most moderation already happens automatically — the queue gating, the cooldowns, the block-aware matchmaking all run without anyone reading anything. The mods are catching the cases those mechanisms can’t handle on their own.

The trust math gets trickier with more new users. New users sit at the default of 50, which is where weighted downvoting is at its weakest. A flood of new users means a flood of low-weight voters, which means the rating-driven part of the system slows down. The fix is probably an extra signal: a minimum number of calls before a vote counts at full weight, or some kind of corroboration check across multiple raters. Neither is in the code yet because the current size doesn’t need them.

The block-aware matching is computed per queue scan. Right now it’s a single database query. At ten times the current size it’d want an in-memory cache; at a hundred times it’d want partitioning. Not exciting work, but doable.

The bit that actually worries me is moderator burnout. Reading reports about people behaving badly on the platform is genuinely grim work, and there’s no version of this that gets less grim with practice. The way to keep mods sustainable is to share the load (rotation, time off, peer support), keep the queue manageable so it doesn’t pile up, and aggressively auto-resolve cases the trust system already handled. The platform that doesn’t think about its moderator team’s wellbeing is the one whose moderators quietly stop reading their queue.

What it doesn’t do, deliberately

A few omissions worth being upfront about.

No keyword filters on the data channel. We see the data channel pass through — that’s how in-call messages and games work — but the contents aren’t filtered. Bad typed behaviour gets caught at the call level via the rate system. Adding a keyword filter would mean keeping a list of words, which means caring about the words, which generalises to nothing useful and creates a queue of “is ‘X’ the bad word or the medical term” appeals that no one wants to deal with.

No AI classifier on the audio. We couldn’t run one anyway, since the audio isn’t ours. We also wouldn’t if we could, for the same privacy reason.

No proactive identity verification. No IDs, phone numbers, or social account logins for general users. Guests stay anonymous. The cost is that a banned user can come back as a fresh guest; the benefit is that everyone else gets to use the site without handing over identifying information. We think the trust system catches the cycling-fresh-account case fast enough that the trade is worth it. Time will tell.

Most chat platforms don’t work this way, partly because they can’t (the architecture is different), and partly because the more common approaches have failure modes that are harder to see from outside until you’re inside them.

← Back to Blog