Item Analysis Calculator

Enter student responses for each test question to calculate difficulty index (p-value) and discrimination index (D) for every item. Input the number of questions, number of students, and the correct answer counts for upper and lower scoring groups. Your Item Analysis Calculator returns per-question difficulty, discrimination, and point-biserial correlation — helping you identify which questions are too easy, too hard, or failing to differentiate strong from weak students.

Total students who took the test

Number of items to analyze (max 50)

Percentage of students in each extreme group

Results

Average Difficulty Index (p̄)

--

Average Discrimination Index (D̄)

--

Estimated Cronbach's Alpha

--

Items Analyzed

--

Group Size (per extreme group)

--

Difficulty & Discrimination by Question

Results Table

Frequently Asked Questions

What is the difficulty index (p-value) in item analysis?

The difficulty index (p) is the proportion of all students who answered a question correctly. It ranges from 0.0 (no one got it right) to 1.0 (everyone got it right). An ideal item difficulty is between 0.30 and 0.70 — items outside this range may be too easy or too hard to effectively measure student ability.

What is the discrimination index (D)?

The discrimination index (D) measures how well a question distinguishes between high-performing and low-performing students. It is calculated as D = (Upper correct − Lower correct) / Group size. Values above 0.30 are considered good; values below 0.20 suggest the item needs revision. Negative values indicate the item is actually answered correctly more often by weaker students, which usually signals a flawed question.

What percentage split should I use for upper and lower groups?

The classical standard is the top and bottom 27% of students, which maximizes the reliability of the discrimination index. Some educators use 25% (top/bottom quarter) or 33% (top/bottom third). For smaller classes, a 50% split (median split) is acceptable to maintain adequate group sizes.

What do the difficulty level labels mean?

Items with p < 0.20 are classified as 'Very Hard', 0.20–0.39 as 'Hard', 0.40–0.59 as 'Moderate', 0.60–0.79 as 'Easy', and p ≥ 0.80 as 'Very Easy'. A balanced test typically contains a mix of difficulty levels, with most items in the moderate range.

How is Cronbach's Alpha estimated here?

This calculator estimates Cronbach's Alpha using the KR-20 approximation based on average item difficulty and variance. It reflects internal consistency — how well the test items measure the same underlying construct. Alpha values above 0.70 are generally considered acceptable for classroom tests; above 0.80 is good.

How many questions can I analyze at once?

This calculator supports analysis of up to 5 questions simultaneously using the default setup. Enter upper and lower group correct counts for each question. The calculator uses only as many questions as you specify in the 'Number of Test Questions' field, so you can set that to fewer than 5 if needed.

What should I do if a question has a negative discrimination index?

A negative D-value means lower-scoring students outperformed higher-scoring students on that item — a red flag. Review the question for ambiguous wording, multiple defensible answers, or a keying error. Such items should be revised or dropped from scoring before finalizing test results.

Why do difficulty and discrimination need to be considered together?

A very easy or very hard item (p near 0 or 1) will always have a near-zero discrimination index regardless of its quality, because there is little variance to discriminate. This is why good test design targets moderate difficulty (0.40–0.70) — it gives items the best chance of showing meaningful discrimination between student ability levels.

More Education & Academic Tools