Update
I replaced our 24-hour human moderation team with an AI system that now handles every community report, across over 10,000 hours of live spoken audio per day. We cut moderation costs by $4,000 per month, reduced daily user reports by over half, and brought appeal rates down to one-third of what they were. Its simply faster, more accurate and consistent that a person could ever be.
Observation
At Hilokal, a social audio language learning app, one bad actor can ruin the experience for dozens of users. And yet, the damage is subtle, hard to trace, easy to ignore. It’s the same with all social media, really. We used to handle this problem with a human team covering every time zone. They processed about 150 user reports per day, each manually reviewed in our admin dashboard. The KPI was: respond to all reports as they come in within one hour and minimize appeals.
Now? The same volume is handled in under five minutes, autonomously.
The shift happened in layers.
First: we wired a two-minute rolling audio buffer into every public voice channel. Conversations are never recorded, and that two-minute rolling audio is almost always discarded. But, when a user is reported, the system snapshots that 2minute audio window, transcribes it with Whisper, translates it, and packages it with metadata: public chat messages, shared images, profile details of both parties, any uploaded evidence, plus prior infractions and IP-linked accounts.
Second: GPT-4.1 evaluates the case through a custom decision prompt at low-temp. We tried with o3, OpenAi’s reasoning model, but GPT-4.1 works better with a 0.2 temperature and it’s cheaper.
It’s not just rules, it’s reasoning. There’s a beastly prompt over 2000 words that includes our public facing community guidelines, report processing logic and all meta data as variables. It’s given few-shot examples with confidence levels from 1 to 100. If the AI decides on an action to the report with a confidence of over 80, then it’s automatically executed. Server-side logic handles the cascade: messages sent to both reported and reporter, chat/audio/app blocks, and punishment durations ranging from one hour to permanent bans.
There’s a second, proactive layer too. We run all public-facing text (like chatroom titles) through a moderation LLM. That filter alone cut our daily report volume from 150 to about 60.
Human oversight is down from twenty-four to a single hour a day. One supervisor reviews appeals first, then permanent decisions, then edge cases. We ran this GPT moderator in parallel with moderation team for a month. It took some adjusting in the beginning, but soon the moderation team would just accept the lightning fast AI decisions. Every once in a while they’d find an exception.
Now, the supervisor doesn’t tweak prompts directly. Each AI response includes not just a decision and confidence level, but a recommendation for how to improve the system’s metadata. It shares what information was missing from the community guidelines or meta data, and what might help next time. A senior prompt engineer reviews those suggestions and refines the prompt context accordingly.
Appeals have dropped to a third of what they were. Reports are down more than half. Every decision is faster, more consistent, and easier to audit.
Capsule Note
In our own business, we are replacing people with AI. Surly this is happening across all industries.
Signing off, message set for future delivery.
– David
During the Upload is a blog focused on living through AI as a developer and new dad. I am a startup founder building AI tutors and agents, and in my spare time I build other stuff. Posts include building with AI and takes on news.
Each post is sent in a time capsule to myself 25 years later to keep the perspective grounded. Consider subscribing to get notified when new, forward-thinking articles are written from a builder on the forefront of AI.