Local Voice Service - 1

Local Voice Service

Streaming ASR + TTS on edge hardware — sub-180ms latency on Jetson, fully offline, no cloud dependency. Supports Jetson Orin, RK3576, RK3588, and Raspberry Pi.

初級15min音声 AI
音声Jetsonasrttslocalrk3576rk3588raspberry-pi

What This Capability Does

Adds “listen and speak” capability to robots, devices, and applications. Speech recognition and speech generation run on your local edge device, so audio does not need to go to the cloud after deployment.

What You Get After Deployment

  • A local listening service: send microphone audio in and receive recognized text.
  • A local speaking service: send text in and receive playable speech.
  • Standard interfaces for robots, web apps, kiosks, industrial systems, or your own AI conversation flow.
  • Offline operation after the first deployment downloads the image and models.

Where It Fits

ScenarioHow to Use It
Voice-controlled robotsTurn spoken commands into text, send them to your control logic or LLM, then speak the response back
Smart kiosksLet visitors ask questions out loud, query a knowledge base locally, and hear the answer
Industrial voice commandsTrigger actions by voice when operators cannot use a screen or keyboard
Private voice entry pointLet multiple devices send audio to one edge device for centralized listening and speaking

Interfaces for Your Application

CapabilityHow to ConnectPort / PathOutput
Live transcriptionWebSocket:8621/asr/streamRecognized text as it arrives
Live speech playbackHTTP POST:8621/tts/streamPlayable audio stream
Generate speech fileHTTP POST:8621/ttsWAV file
Upload audio for transcriptionHTTP POST:8621/asrRecognized text
Service statusHTTP GET:8621/healthReadiness status

Technical Specs

SpecJetson Orin NXRK3588RK3576Raspberry Pi 5
Speech to textParaformer / Qwen3 (TensorRT)Qwen3 (RKNN)Qwen3 (RKNN)Paraformer (ONNX)
Text to speechMatcha-TTS / Qwen3 (TensorRT)Matcha (RKNN)Matcha (RKNN)Matcha (ONNX)
Voice-to-Voice Latency (p50)58 ms394 ms1099 ms
Memory Required2 GB6 GB4 GB2 GB
Disk Required7.5 GB4.4 GB4.4 GB2.8 GB
Languageszh+en / 52 (Qwen3)zh+en / 52 (Qwen3)zh+en / 52 (Qwen3)zh+en

Supported Hardware: Jetson Orin Nano/NX/AGX · RK3576 · RK3588 · Raspberry Pi 4/5 Network: Internet needed for first deployment (downloads image + models). Works fully offline after setup.

連携インターフェース

websocket

Real-time streaming speech recognition (int16 PCM in, JSON out)

/asr/stream · Port: 8621
{"text":"hello world","is_final":true,"is_stable":true}
http_stream

Streaming text-to-speech (JSON in, raw PCM stream out)

/tts/stream · Port: 8621 · Method: POST
http

Batch text-to-speech (JSON in, WAV out)

/tts · Port: 8621 · Method: POST
http

Service health check (returns ASR and TTS readiness)

/health · Port: 8621 · Method: GET
{"asr":true,"tts":true,"streaming_asr":true}

デプロイ構成

お問い合わせ
ハードウェアパートナーとしてうれしいです!
次へ