AI Tutor: multi-agent pipeline

Pipeline detailed

Pre-LLM (synchronous, zero LLM cost)

StudentModelService.getSnapshot(userId, courseId) loads ConceptMastery (Beta distribution per concept), active/resolving misconceptions, EpisodicMemory, quizContext (avgScore, passRate, weakAreas), ChatSession history.
RetrievalAgent.retrieve(query, studentModel) reformulates query, searches pgvector with tenant+course filters, boost by quizWeakAreas.
PedagogicalAgent.select(studentModel, query) evaluates mastery + applies adjustments (chat-quiz divergence, age, learning style) → returns strategy.
buildEnrichedPrompt assembles system prompt with strategy + RAG context + active misconceptions + recent quiz attempts.

Main LLM (streaming, SSE)

router.stream(taskType: "chat_tutor", messages, options) resolves provider via TenantTaskModelConfig. Fallback: Claude → OpenAI → xAI Grok → Google Gemini. Circuit breaker per provider (Redis state). Metering middleware: rate limit + credit check.

Post-LLM (background via after(), fire-and-forget)

EvaluationAgent (Haiku, ~$0.001) classifies understanding, detects StudentMisconception, updates ConceptMastery (Bayesian).
ContentAgent (Haiku, ~$0.001) pre-generates follow-up exercise (Redis 30min TTL).
SessionSummarizer (every 10 turns, Haiku) summarizes long history.
SupervisorAgent (Haiku, ~$0.001) classifies severity + category, applies strikes/quarantine.

Admin configuration

TenantTaskModelConfig chooses provider+model per task type
PedagogicalConfig configures thresholds per tenant (default 0.3, 0.5, 0.7, 0.9), domainOverrides, ageOverrides, learningStyleOverrides

Limitations

First-token latency: 800-1500ms (including pre-LLM pipeline)
Total turn latency: 2-8s for medium response (~300 words)
Context window: limited by provider (Claude Sonnet 4.6 = 200K, GPT-4o = 128K)
Talking avatar + voice (output): available as an opt-in per course — a real-time talking avatar with TTS voice via HeyGen LiveAvatar or D-ID (BYO provider key). See Avatar & TTS.
Voice input (STT): speaking to the tutor (student speech → speech-to-text → chat) is implemented on B2B (dictation: the speech becomes text in the message field, no auto-send). What remains on the roadmap is two-way voice conversation (a real-time loop with the avatar) and automatic pronunciation scoring.