Studeia offers a proficiency calculation via Item Response Theory (IRT) for ENEM-style assessments. Here we explain what it is, how it works — and, honestly, what it is not.
Quick answer
- 2PL IRT model (discrimination + difficulty), ENEM-style
- NOT INEP's official 3PL (no guessing parameter)
- Calibration needs ~20 responses/item; below that, CTT fallback
- Proficiency (theta) via EAP, 0–1000 scale
- Useful to measure fairly and spot problematic items
What IRT is
Item Response Theory models the probability of answering a question correctly as a function of the student's proficiency and the item's characteristics. In the 2PL model, each item has two parameters:
- Discrimination (a): how well the item separates strong from weak students.
- Difficulty (b): the proficiency level at which the chance of a correct answer is 50%.
Unlike a simple percentage, IRT weighs each item differently — a hard, discriminating question "counts more" than an easy, ambiguous one.
Honesty: 2PL, not INEP's 3PL
The official ENEM uses a 3PL model that includes a third parameter (guessing). Studeia implements 2PL — faithful to the ENEM spirit, but not INEP's official calculation. We use it to rank proficiency and review items, without promising to reproduce the official score.
How it works
- Calibration: from responses, item discrimination and difficulty are estimated (via a CTT→logistic transformation).
- Minimum data: below ~20 responses per item, the platform uses CTT (percent correct) to avoid unstable estimates.
- Proficiency (theta): estimated via EAP (Expected A Posteriori) with a normal prior.
- Scale: theta converted to 0–1000, ENEM-style.
What to use it for
| Goal | How IRT helps |
|---|---|
| Measure fairly | Hard/discriminating items weigh differently |
| Compare students | Common 0–1000 scale |
| Review the exam | Low-discrimination items are removal candidates |
| ENEM simulations | Proficiency in the exam's style |
Limitations (stated)
- It's 2PL, not 3PL — it doesn't reproduce ENEM's official score.
- It needs response volume to calibrate (else, CTT).
- It doesn't replace pedagogical analysis — it's a measurement tool.
FAQ
Is it INEP's 3PL? No — 2PL ENEM-style, without a guessing parameter.
How many responses to calibrate? ~20 per item; below that, CTT fallback.
On what scale? Theta via EAP converted to 0–1000.
What's it for? Measuring fairly, comparing students and reviewing items.
See the Quiz Engine and the ENEM test-prep use case.