Proficiency via IRT (2PL model, ENEM-style)

Studeia offers a proficiency calculation via Item Response Theory (IRT) for ENEM-style assessments. Here we explain what it is, how it works — and, honestly, what it is not.

Quick answer

2PL IRT model (discrimination + difficulty), ENEM-style
NOT INEP's official 3PL (no guessing parameter)
Calibration needs ~20 responses/item; below that, CTT fallback
Proficiency (theta) via EAP, 0–1000 scale
Useful to measure fairly and spot problematic items

What IRT is

Item Response Theory models the probability of answering a question correctly as a function of the student's proficiency and the item's characteristics. In the 2PL model, each item has two parameters:

Discrimination (a): how well the item separates strong from weak students.
Difficulty (b): the proficiency level at which the chance of a correct answer is 50%.

Unlike a simple percentage, IRT weighs each item differently — a hard, discriminating question "counts more" than an easy, ambiguous one.

Honesty: 2PL, not INEP's 3PL

The official ENEM uses a 3PL model that includes a third parameter (guessing). Studeia implements 2PL — faithful to the ENEM spirit, but not INEP's official calculation. We use it to rank proficiency and review items, without promising to reproduce the official score.

How it works

Calibration: from responses, item discrimination and difficulty are estimated (via a CTT→logistic transformation).
Minimum data: below ~20 responses per item, the platform uses CTT (percent correct) to avoid unstable estimates.
Proficiency (theta): estimated via EAP (Expected A Posteriori) with a normal prior.
Scale: theta converted to 0–1000, ENEM-style.

What to use it for

Goal	How IRT helps
Measure fairly	Hard/discriminating items weigh differently
Compare students	Common 0–1000 scale
Review the exam	Low-discrimination items are removal candidates
ENEM simulations	Proficiency in the exam's style

Limitations (stated)

It's 2PL, not 3PL — it doesn't reproduce ENEM's official score.
It needs response volume to calibrate (else, CTT).
It doesn't replace pedagogical analysis — it's a measurement tool.

FAQ

Is it INEP's 3PL? No — 2PL ENEM-style, without a guessing parameter.

How many responses to calibrate? ~20 per item; below that, CTT fallback.

On what scale? Theta via EAP converted to 0–1000.

What's it for? Measuring fairly, comparing students and reviewing items.

See the Quiz Engine and the ENEM test-prep use case.

FAQ

Does Studeia use the same 3PL IRT as ENEM/INEP?

No. Studeia implements a 2PL IRT model (two parameters: discrimination and difficulty), ENEM-style, but it is not INEP's official 3PL model (which includes a guessing parameter). It's an honest 2PL IRT, useful for ranking proficiency and spotting problematic items, without passing itself off as ENEM's official calculation.

How many responses are needed to calibrate?

2PL calibration needs a minimum volume of responses per item to be reliable — around 20 responses. Below that threshold, the platform falls back to Classical Test Theory (CTT, based on percent correct), avoiding unstable estimates with little data.

How is proficiency estimated and on what scale?

Proficiency (theta) is estimated via EAP (Expected A Posteriori) with a normal prior, from the calibrated 2PL parameters, and converted to a 0–1000 ENEM-style scale. This lets you compare students on a common ruler, rather than just by raw number correct.

What is IRT useful for in practice?

To measure proficiency more fairly than a simple percentage (hard, discriminating questions weigh differently), rank students on a comparable scale, and identify problematic items (low discrimination) for review. It's useful in ENEM-style test-prep and simulations and in large-scale assessments.