Translator Device

Research and constraints

Project Constraints

Primary limitations that will influence the project's design, scope, and execution.

Budget

  • Component selection (ESP32 model, microphone, display)
  • Overall quality and feature set
  • Prototyping costs and potential iterations

Parts & Components

  • Availability and shipping lead times
  • Ensure parts are high quality
  • Hardware compatibility and pin conflicts

Time

  • Software development and API integration
  • Learning curve for new technologies
  • Assembly, testing, and refinement cycles

Coding Ability

  • Handling real-time audio streams
  • Managing multiple cloud APIs and errors
  • Choice of IDE (Arduino vs. ESP-IDF)

Tool Access

  • Requires soldering iron, multimeter, etc.
  • 3D printer needed for custom enclosure
  • Limits physical build quality and ease

Safety

  • LiPo battery charging and management
  • Proper wiring and short-circuit protection
  • Enclosure design for electronic safety

Investigation & Research

Understanding the current landscape of translation devices and related technologies is crucial for making informed design decisions. This research phase explores existing solutions, technical possibilities, and contextual factors that will shape our approach.

Existing Solutions & Case Studies

Select Earbuds

What it does: Real-time translation through earbuds connected to smartphone

Hardware: Bluetooth earbuds, smartphone dependency
Interface: Voice commands, touch controls
Pros: High accuracy, seamless integration
Cons: Requires smartphone, expensive

Pocketalk Translator

What it does: Dedicated handheld translation device with built-in SIM

Hardware: Custom device, cellular modem, touchscreen
Interface: Physical buttons, 2.4" display
Pros: Standalone, global connectivity
Cons: Subscription required, bulky

Travis Touch Go

What it does: Pocket translator with offline capabilities

Hardware: ARM processor, 3.5" touchscreen, dual mics
Interface: Touch interface, voice activation
Pros: Offline mode, multiple languages
Cons: Limited offline accuracy, expensive

M2 Language Translator Earbuds

What it does: Two-way translation earbuds for conversations

Hardware: Bluetooth 5.0, noise cancellation
Interface: App-based control, touch sensors
Pros: Discreet, conversation-focused
Cons: App dependency, battery life

DIY ESP32 Voice Projects

What it does: Community projects using ESP32 for voice processing

Hardware: ESP32, I2S microphones, MAX98357A amp
Interface: Physical buttons, LED indicators
Pros: Low cost, customisable, educational
Cons: Limited processing, complex setup

Inspiration & Technical Exploration

System Architecture Exploration

Based on the research devices, several architectural approaches can be considered. The following are the most promising pathways.

Hardware Architecture Concepts

Concept 1: Minimal ESP32 Design
Translation Process Flow Diagram
  • Pros: Low cost (~$30), simple assembly
  • Cons: No visual feedback, limited user interface
  • Use case: Proof of concept, basic functionality testing
Concept 2: Enhanced Display Design
  • Pros: Full UI, language selection, status display
  • Cons: Higher cost (~$60), more complex assembly
  • Use case: Production-ready device with full features

Software Architecture Flow

Translation Process Flow
Translation Process Flow Diagram

Key Technical Decisions:

  • • Use FreeRTOS tasks for parallel processing
  • • Implement circular buffer for audio streaming
  • • Add retry logic with exponential backoff
  • • Cache common phrases to reduce API calls

Visual Design Inspiration & Component Photos

ESP32-S3 Board

ESP32-S3 Development Board

INMP441 Microphone

INMP441 I2S Microphone

MAX98357A Amplifier

MAX98357A I2S Amplifier

2.4 inch TFT Display

2.4" ILI9341 TFT Display

Smartphone Style Enclosure
Touch screen

Rectangular, 120x60x15mm, touchscreen interface

Compact Pod Enclosure
Compact Pod

Compact, 80x80x25mm, button-based interface; Smaller screen

Pendant Style Enclosure
Pendant Style

Wearable, 50x30x20mm, Single Button language switch

Warning

The devices shown are third party products.

Development Platform Options

Arduino IDE

Pros: Simple, familiar, good libraries

Cons: Limited debugging, basic audio support

ESP-IDF

Pros: Full control, advanced audio, debugging

Cons: Steeper learning curve, more complex

PlatformIO

Pros: Best of both, VS Code integration

Cons: Additional setup complexity

Visual Studio Code

Pros: Many extensions to assist with coding

Cons: Non-native platform

API Service Comparison

Google Cloud APIs

Accuracy: 95%+ speech, 90%+ translation

Cost: $1.44/hour speech recognition

Azure Cognitive

Accuracy: 93%+ speech, 88%+ translation

Cost: $1.00/hour speech recognition

Amazon Polly/Transcribe

Accuracy: 92%+ speech, 85%+ translation

Cost: $0.96/hour speech recognition

Multiple Providers

Accuracy: Varies

Cost: Typically most cost effective

Assembly Process

1 Breadboard prototype testing
2 PCB design and ordering
3 Component soldering
4 Enclosure 3D printing
5 Final assembly & testing

Power Consumption Analysis

Warning

Whilst these numbers are given from the datasheet, they may not reflect all real world use cases.

Component Power Draw
ESP32-S3 (Active): 240mA @ 3.3V
ESP32-S3 (Sleep): 10μA @ 3.3V
TFT Display (On): 50mA @ 3.3V
Audio Components: 80mA @ 3.3V
Total (Active): 370mA @ 3.3V
Battery Life Estimation
2000mAh LiPo (Active): ~5.4 hours
With 50% sleep time: ~8.5 hours
Standby (sleep mode): >1 month

Strategy: Have options for sleep modes between translations, display auto-off after 30s, Wi-Fi power saving to maximise power saving when desired by the end user.

Analysis of Findings

Key Design Influences from Research

Hardware Design Decisions
  • I2S Audio Pipeline: INMP441 → ESP32-S3 → MAX98357A will provide professional-grade audio quality similar to high-end commercial devices available on the market.
  • Display Integration: 2.4" TFT allows text display of translations, addressing translation confidence. (showing transcriptions)
  • Standalone Design: No smartphone dependency eliminates connectivity issues, privacy concerns and ease of use
  • Physical Controls: Dedicated push-to-talk button ensures reliable operation under stress
  • Compact Form Factor: Pocket-sized design inspired by successful devices like Pocketalk
Software Architecture Decisions
  • Google Cloud APIs: Primary choice for highest accuracy rates (95%+ speech recognition)
  • Azure Fallback: Redundancy system prevents single-point-of-failure
  • ESP-IDF Platform: Advanced audio handling capabilities justify learning curve
  • FreeRTOS Tasks: Parallel processing prevents UI blocking during API calls
  • Progressive UI: Visual status indicators address user uncertainty during processing

Critical Technical Challenges

Real-time Audio Processing

Challenge: ESP32 has limited RAM (512KB) for audio buffering

Solution: Implement circular buffer with 4KB chunks, stream directly to API without full file storage

Implementation: Use I2S DMA for hardware-level audio capture, FreeRTOS queue for buffer management

Network Reliability

Challenge: Wi-Fi dropouts during translation cause user frustration

Solution: Exponential backoff retry (1s, 2s, 4s), connection health monitoring, user feedback

Implementation: Background task monitors RSSI, preemptive reconnection, cached translation pairs

Power Management

Challenge: Constant Wi-Fi + display + audio processing drains battery quickly

Solution: Aggressive sleep states, display dimming, Wi-Fi power save mode

Implementation: Modem sleep between translations, 5-second display timeout, 240MHz → 80MHz scaling

Audio Quality Control

Challenge: Background noise and varying input levels affect recognition accuracy

Solution: Software AGC (Automatic Gain Control), noise gate, pre-processing filters

Implementation: Digital filters in ESP32, volume normalisation, silence detection thresholds

Design Pitfalls to Avoid

Complex Menu Systems

Problem: Multi-level menus slow down urgent translation needs

Avoidance: Maximum 2-level menu depth, easy to navigate, clear visual hierarchy

Reference: Travis Touch Go suffers from deep menu navigation

Subscription Dependencies

Problem: Monthly fees create barriers to usage and ownership

Avoidance: Pay-per-use API model, user controls their own API keys, offline fallback

Reference: Pocketalk's $50/year subscription reduces adoption

Fragile Construction

Problem: Travel devices need to survive drops and environmental stress

Avoidance: Reinforced corners, flexible materials, internal shock mounting for electronics

Reference: Many DIY projects fail due to poor mechanical design and circuitry

Unclear Status Indicators

Problem: Users lose confidence when they can't see device state

Avoidance: Always-visible status, progress bars, clear error messages

Reference: Wireless Earbuds (e.g. Samsung Buds 4) provide minimal feedback, causing user confusion

Poor Audio Quality

Problem: Low-quality microphones cause speech recognition failures

Avoidance: Professional-grade I2S microphone, proper acoustic design, noise cancellation

Reference: Arduino projects often use poor analogue microphones

Implementation Strategy Based on Findings

Phase 1: Core Functionality
  • • Build minimal hardware (ESP32-S3 + INMP441 + MAX98357A)
  • • Implement basic audio capture and playback
  • • Test Eleven Labs and LLMs
  • • Validate translation pipeline end-to-end
  • • Measure latency and audio quality
Phase 2: User Interface
  • • Add 2.4" TFT display for visual feedback
  • • Implement language selection interface
  • • Add status indicators and progress bars
  • • Create error handling and retry logic
  • • Test usability with target users
Phase 3: Production Ready
  • • Design and 3D print professional enclosure
  • • Implement power management and battery charging
  • • Add fallback API services for reliability
  • • Optimise for 3+ second translation targets
  • • Conduct durability and field testing

Success Criteria

Success for this project is not just creating a working device, but one that meets specific, measurable targets. This section defines the benchmarks for functionality, performance, usability, and staying within the project constraints. Ensuring this criteria is met will ensure that the device was successful.

1. Functionality

Does the device perform its core tasks correctly?

Core Feature Checklist

  • Audio capture starts/stops on command.
  • Device connects to Wi-Fi.
  • Audio sent to speech-to-text service.
  • Text sent to translation service.
  • Translated text sent to text-to-speech.
  • Translated audio is played.

Accuracy Targets

Translation: >90%

Speech Recognition: >95%

2. Performance & Reliability

How well and how consistently does the device work?

Translation Speed

Target: < 3.0s

Wi-Fi Reconnect

< 15s

Operational Stability

1 Hr

minimum continuous use

3. Usability

Is the device practical and easy for someone to use?

Device Setup

< 60s

to success

Portability

Pocket friendly

Battery Life

1+ Hr

active use

4. Adherence to Constraints

Did the project meet its initial goals and limits?

Final Budget

On Target

$100 maximum

Timeline

On Target

Project started