Skip to content
Go back

I Built an AI Voice Agent for Collections (In a Weekend)

Edit page

Pernah terima telepon collection yang kayak ngobrol sama robot? Kaku, scripted, dan rasanya cuma mau nagih tanpa peduli situasi lo.

Working in fintech, I hear these calls all the time. And honestly? They’re broken. The person on the other end is just doing their job, reading from a script, probably exhausted from making hundreds of calls a day. The borrower feels attacked. Nobody wins. So I thought: what if we could make this… less terrible?

I threw this together in a weekend. Seriously. I wanted to see if current AI voice tech could handle something like collection calls - but in Bahasa Indonesia, and with some actual empathy.

The idea was simple: build a voice agent that can have a real conversation. Not just “kapan bisa bayar?” on repeat, but something that actually listens, understands context, and responds like a human would.

Here’s the thing though - most AI voice tools are built for English. Getting something to sound natural in Bahasa? Way harder than I expected.

ElevenLabs (the voice synthesis I used) is amazing for English. But Bahasa Indonesia has different rhythms, different intonations. The first few versions sounded like a bule trying too hard - technically correct but weirdly off.

I spent most of the weekend just tweaking prompts and voice settings to get something that didn’t make me cringe. It’s still not perfect, but it’s getting there.

Here’s my honest take: collection calls don’t have to be dehumanizing. Most people who miss payments aren’t trying to scam anyone - they’re dealing with life. Maybe they lost their job, or had a medical emergency, or just forgot.

An AI agent that can have a real conversation - one that listens to their situation, offers flexible options, maybe even follows up at a better time - that could actually be more humane than a stressed human following a rigid script.

Is this demo production-ready? Absolutely not. But it shows what’s possible.

The tech is there. Now the question is whether the industry wants to use it to squeeze people harder, or to make a genuinely unpleasant experience a little more human.

I’m betting on the latter. Maybe I’m naive, tapi ya sudahlah.

Built with v0 for the frontend and ElevenLabs for voice synthesis.


Edit page
Share this post on:

Previous Post
Speaking at Bank Indonesia: Presenting AI to Regulators
Next Post
I Used Claude Code to Build a Credit Scoring Model (Here's What I Learned)