Earlier this year my Polish parents flew to Egypt to visit me and my wife. They spoke zero Arabic. My in-laws spoke zero Polish. English helped a little, but only I could glue everyone together—and that meant I spent the whole trip translating instead of catching up.
I promised myself that the next time we met, I would bring something better than Google Translate on my phone. That vow became LiveTranslator, a full-stack project that now lives in /opt/stack/livetranslator. In seven intense days I built a live, room-based translation platform with speech-to-text, machine translation, and streaming captions.
The sprint plan
Day 0 was a whiteboard session turning frustration into requirements:
- Conversations need to feel live and natural.
- Multiple people speaking at once must stay intelligible.
- Quality and cost should adapt to each language pair.
- I can’t rely on a single provider—failover is mandatory.
Architecture choices
The architecture captured in DOCUMENTATION.md guided the build:
- FastAPI backend with WebSocket hubs for ultra-low latency.
- React/Vite frontend that renders live captions, language badges, and presence to everyone in the room.
- PostgreSQL + Redis for routing tables, cost tracking, and connection pooling.
- Provider abstraction layers for both speech-to-text (STT) and machine translation (MT), each with language-aware routing.
By Day 3, the STT pipeline was alive. A single WebSocket connection per (room, provider) tuple listens to participants and fan-outs transcriptions within ~2 seconds. Late-final blocking and health monitoring stop repeated words—a surprisingly common issue with streaming APIs.
Day 5 brought MT routing online. A dynamic matrix now picks the best provider per language pair:
- DeepL for European languages (Polish–English).
- OpenAI GPT-4o-mini for Egyptian Arabic nuance.
- Amazon Translate and Google Cloud as budget and fallback options.
Caching partial translations cut API calls by about 35%, critical for keeping costs sane when the family chatters nonstop.
The first live test
One week after kickoff, I opened the app on my laptop, handed a tablet to my dad, and spun up the room on a TV in our living room. Within minutes:
- My dad spoke in Polish, and the captions appeared in English for my wife and Arabic for her parents.
- My mother-in-law told a story in Arabic, generating Polish and English transcripts simultaneously.
- When I stepped out to cook dinner, conversation continued without me. Victory.
Lessons learned
- Multi-provider pays off. Speechmatics delivered higher-quality Polish transcripts, while GPT-4o-mini handled Arabic idioms gracefully.
- Operational dashboards matter. The built-in cost and latency trackers answered skeptical questions from the family CFO (my dad).
- UX is half the battle. Presence indicators, room language flags, and toast notifications kept everyone confident that the system was listening.
What’s next
- Packaging the stack for friends with multilingual families.
- Adding on-device VAD to reduce ambient noise before it hits the cloud.
- Shipping an offline cache so internet blips don’t break the flow.
LiveTranslator turned a stressful visit into the effortless family holiday I always wanted. Now when my parents land in Cairo, they’re greeted not just with hugs but with a seamless translation layer that lets them connect with everyone in the room.