Product Delivery

Shipping LiveTranslator in a Week to Help My Parents Travel

Pawel Gawliczek
realtime speech-to-text machine-translation

Earlier this year my Polish parents flew to Egypt to visit me and my wife. They spoke zero Arabic. My in-laws spoke zero Polish. English helped a little, but only I could glue everyone together—and that meant I spent the whole trip translating instead of catching up.

I promised myself that the next time we met, I would bring something better than Google Translate on my phone. That vow became LiveTranslator, a full-stack project that now lives in /opt/stack/livetranslator. In seven intense days I built a live, room-based translation platform with speech-to-text, machine translation, and streaming captions.

The sprint plan

Day 0 was a whiteboard session turning frustration into requirements:

  1. Conversations need to feel live and natural.
  2. Multiple people speaking at once must stay intelligible.
  3. Quality and cost should adapt to each language pair.
  4. I can’t rely on a single provider—failover is mandatory.

Architecture choices

The architecture captured in DOCUMENTATION.md guided the build:

  • FastAPI backend with WebSocket hubs for ultra-low latency.
  • React/Vite frontend that renders live captions, language badges, and presence to everyone in the room.
  • PostgreSQL + Redis for routing tables, cost tracking, and connection pooling.
  • Provider abstraction layers for both speech-to-text (STT) and machine translation (MT), each with language-aware routing.

By Day 3, the STT pipeline was alive. A single WebSocket connection per (room, provider) tuple listens to participants and fan-outs transcriptions within ~2 seconds. Late-final blocking and health monitoring stop repeated words—a surprisingly common issue with streaming APIs.

Day 5 brought MT routing online. A dynamic matrix now picks the best provider per language pair:

  • DeepL for European languages (Polish–English).
  • OpenAI GPT-4o-mini for Egyptian Arabic nuance.
  • Amazon Translate and Google Cloud as budget and fallback options.

Caching partial translations cut API calls by about 35%, critical for keeping costs sane when the family chatters nonstop.

The first live test

One week after kickoff, I opened the app on my laptop, handed a tablet to my dad, and spun up the room on a TV in our living room. Within minutes:

  • My dad spoke in Polish, and the captions appeared in English for my wife and Arabic for her parents.
  • My mother-in-law told a story in Arabic, generating Polish and English transcripts simultaneously.
  • When I stepped out to cook dinner, conversation continued without me. Victory.

Lessons learned

  1. Multi-provider pays off. Speechmatics delivered higher-quality Polish transcripts, while GPT-4o-mini handled Arabic idioms gracefully.
  2. Operational dashboards matter. The built-in cost and latency trackers answered skeptical questions from the family CFO (my dad).
  3. UX is half the battle. Presence indicators, room language flags, and toast notifications kept everyone confident that the system was listening.

What’s next

  • Packaging the stack for friends with multilingual families.
  • Adding on-device VAD to reduce ambient noise before it hits the cloud.
  • Shipping an offline cache so internet blips don’t break the flow.

LiveTranslator turned a stressful visit into the effortless family holiday I always wanted. Now when my parents land in Cairo, they’re greeted not just with hugs but with a seamless translation layer that lets them connect with everyone in the room.

Know someone exploring AI translation or automation? Share it with them. Say hello →

Continue reading

More posts from my AI build log.

Personal

Hi, my name is Paweł

How fifteen years across QA, product, and engineering led me from Warsaw to Cairo—and why I started documenting my AI journey for the volatile decade ahead.