How it works (single-provider)
The avatar is opt-in per course and uses one provider for both voice and video, so there is no audio/lip desync and no separate TTS step:
- HeyGen → LiveAvatar API (FULL mode): the backend creates a session token and starts a LiveKit room; the client connects to LiveKit for video and speaks by publishing a
speak_textevent on the LiveKit data channel. HeyGen does TTS + video. - D-ID → clips/streams: the backend proxies SDP/ICE; the tutor speaks with text and D-ID does the TTS.
The client connects WebRTC directly to the provider — the video never passes through Studeia's server. The backend only creates the session, proxies speak/sdp/ice (D-ID) and records usage on stop.
Configuration
- Per tenant: connect a HeyGen or D-ID key (encrypted AES-256-GCM), test it, and set a monthly minute cap.
- Per course:
Course.avatarProvider,avatarId,avatarVoiceId,avatarQualityand theavatarEnabledflag. One key → many avatars (avatar is a per-session parameter).
Security & quota
- The master API key never goes to the client; only ephemeral session/LiveKit tokens do. Speak/SDP/ICE are proxied server-side with
AvatarSession.userIdownership checks. - The monthly cap (
monthlyMinuteCap) is checked before starting a session (fail-closed → quota_exceeded). Usage and cost are written toAvatarUsageLog. - Gating:
avatarEnabled+ a configured provider/avatar on the course + an active enrollment + the student's opt-in.
Graceful degradation
full_avatar → audio_only (TTS + static image) → text_only, so the tutor always responds even if the avatar provider fails.
Mobile
On mobile the avatar runs in a WebView that loads the same /avatar-embed page used on web (no native WebRTC modules in Expo); a React Native bridge forwards control messages.
Not yet (roadmap)
Voice input — the student speaking to the tutor (speech → STT → chat) — is not implemented. Today the avatar is output only (talking head + voice).