Skip to content
Studeia Docs

Avatar & TTS: a real-time talking AI tutor

Studeia's AI tutor can answer with a real-time talking avatar (video + voice, live lip-sync) via HeyGen LiveAvatar or D-ID. BYO encrypted key, configured per course, with graceful degradation.

2026-05-31 6 min
Resposta curta

Studeia's AI tutor can reply with a real-time talking avatar — video + voice with live lip-sync — over WebRTC, via HeyGen LiveAvatar or D-ID. Institutions bring their own provider key (encrypted AES-256-GCM) and configure the avatar, voice and quality per course. It is single-provider (no audio/lip desync), the master key never reaches the client, usage is metered with a monthly cap, and it degrades full_avatar → audio_only → text_only. Voice input (STT) is roadmap.

How it works (single-provider)

The avatar is opt-in per course and uses one provider for both voice and video, so there is no audio/lip desync and no separate TTS step:

  • HeyGen → LiveAvatar API (FULL mode): the backend creates a session token and starts a LiveKit room; the client connects to LiveKit for video and speaks by publishing a speak_text event on the LiveKit data channel. HeyGen does TTS + video.
  • D-ID → clips/streams: the backend proxies SDP/ICE; the tutor speaks with text and D-ID does the TTS.

The client connects WebRTC directly to the provider — the video never passes through Studeia's server. The backend only creates the session, proxies speak/sdp/ice (D-ID) and records usage on stop.

Configuration

  • Per tenant: connect a HeyGen or D-ID key (encrypted AES-256-GCM), test it, and set a monthly minute cap.
  • Per course: Course.avatarProvider, avatarId, avatarVoiceId, avatarQuality and the avatarEnabled flag. One key → many avatars (avatar is a per-session parameter).

Security & quota

  • The master API key never goes to the client; only ephemeral session/LiveKit tokens do. Speak/SDP/ICE are proxied server-side with AvatarSession.userId ownership checks.
  • The monthly cap (monthlyMinuteCap) is checked before starting a session (fail-closed → quota_exceeded). Usage and cost are written to AvatarUsageLog.
  • Gating: avatarEnabled + a configured provider/avatar on the course + an active enrollment + the student's opt-in.

Graceful degradation

full_avatar → audio_only (TTS + static image) → text_only, so the tutor always responds even if the avatar provider fails.

Mobile

On mobile the avatar runs in a WebView that loads the same /avatar-embed page used on web (no native WebRTC modules in Expo); a React Native bridge forwards control messages.

Not yet (roadmap)

Voice input — the student speaking to the tutor (speech → STT → chat) — is not implemented. Today the avatar is output only (talking head + voice).

See also

FAQ

Does Studeia's AI tutor have a talking avatar?

Yes. The tutor can answer with a real-time talking avatar — video plus voice with live lip-sync — over WebRTC. Institutions connect their own HeyGen (LiveAvatar) or D-ID account (a BYO key encrypted at rest), map avatar/voice per course, and enable it per course. It is single-provider: the provider does both TTS and video, so audio and lips stay in sync.

Is the avatar safe and private?

The provider's master API key never reaches the client — session creation and speak/SDP/ICE are proxied server-side with session ownership checks; only ephemeral session tokens reach the browser. Credentials are encrypted AES-256-GCM, usage is metered (AvatarUsageLog) with a monthly cap, and access is gated by enrollment plus the student's opt-in.

What happens if the avatar provider is unavailable?

It degrades gracefully: full_avatar → audio_only (TTS + static image) → text_only. Voice input (the student speaking → speech-to-text) is a separate, not-yet-implemented feature on the roadmap; today the avatar is output only.

Veja tambem

Avatar & TTS: a real-time talking AI tutor