Pipeline detailed
Pre-LLM (synchronous, zero LLM cost)
- StudentModelService.getSnapshot(userId, courseId) loads ConceptMastery (Beta distribution per concept), active/resolving misconceptions, EpisodicMemory, quizContext (avgScore, passRate, weakAreas), ChatSession history.
- RetrievalAgent.retrieve(query, studentModel) reformulates query, searches pgvector with tenant+course filters, boost by quizWeakAreas.
- PedagogicalAgent.select(studentModel, query) evaluates mastery + applies adjustments (chat-quiz divergence, age, learning style) → returns strategy.
- buildEnrichedPrompt assembles system prompt with strategy + RAG context + active misconceptions + recent quiz attempts.
Main LLM (streaming, SSE)
router.stream(taskType: "chat_tutor", messages, options) resolves provider via TenantTaskModelConfig. Fallback: Claude → OpenAI → xAI Grok → Google Gemini. Circuit breaker per provider (Redis state). Metering middleware: rate limit + credit check.
Post-LLM (background via after(), fire-and-forget)
- EvaluationAgent (Haiku, ~$0.001) classifies understanding, detects StudentMisconception, updates ConceptMastery (Bayesian).
- ContentAgent (Haiku, ~$0.001) pre-generates follow-up exercise (Redis 30min TTL).
- SessionSummarizer (every 10 turns, Haiku) summarizes long history.
- SupervisorAgent (Haiku, ~$0.001) classifies severity + category, applies strikes/quarantine.
Admin configuration
- TenantTaskModelConfig chooses provider+model per task type
- PedagogicalConfig configures thresholds per tenant (default 0.3, 0.5, 0.7, 0.9), domainOverrides, ageOverrides, learningStyleOverrides
Limitations
- First-token latency: 800-1500ms (including pre-LLM pipeline)
- Total turn latency: 2-8s for medium response (~300 words)
- Context window: limited by provider (Claude Sonnet 4.6 = 200K, GPT-4o = 128K)
- Talking avatar + voice (output): available as an opt-in per course — a real-time talking avatar with TTS voice via HeyGen LiveAvatar or D-ID (BYO provider key). See Avatar & TTS.
- Voice input (STT): speaking to the tutor (student speech → speech-to-text → chat) is not implemented yet (roadmap).