Kappa Statistic Calculator

Enter your confusion matrix counts into the Kappa Statistic Calculator for a 2×2 to 5×5 grid and get back Cohen's Kappa (κ), standard error, z-score, p-value, 95% confidence interval, and a Landis & Koch interpretation. Choose your number of categories, fill in the observed agreement cells, and the tool handles all chance-correction math for you.

Number of Categories *

Select how many rating categories both raters used.

Row 1 / Col 1 *

Both raters chose Category 1

Row 1 / Col 2 *

Rater 1 chose Cat 1, Rater 2 chose Cat 2

Row 2 / Col 1 *

Rater 1 chose Cat 2, Rater 2 chose Cat 1

Row 2 / Col 2 *

Both raters chose Category 2

Row 1 / Col 3

Row 2 / Col 3

Row 3 / Col 1

Row 3 / Col 2

Row 3 / Col 3

Row 1 / Col 4

Row 2 / Col 4

Row 3 / Col 4

Row 4 / Col 1

Row 4 / Col 2

Row 4 / Col 3

Row 4 / Col 4

Row 1 / Col 5

Row 2 / Col 5

Row 3 / Col 5

Row 4 / Col 5

Row 5 / Col 1

Row 5 / Col 2

Row 5 / Col 3

Row 5 / Col 4

Row 5 / Col 5

Results

Cohen's Kappa (κ)

Agreement Level

Observed Agreement (Po)

Chance Agreement (Pe)

Standard Error

Z-Score

P-Value

95% CI Lower

95% CI Upper

Results Table

More Statistics Tools

Frequently Asked Questions

What is Cohen's Kappa and why is it better than simple percent agreement?

Cohen's Kappa (κ) is a statistical measure of inter-rater reliability that accounts for the agreement expected purely by chance. Simple percent agreement ignores the fact that two raters could agree on some items randomly. Kappa corrects for this, providing a more accurate picture of genuine consensus between raters.

How do I interpret a Kappa value?

The Landis & Koch (1977) guidelines are the most widely used: κ ≤ 0 indicates no agreement, 0.01–0.20 is slight, 0.21–0.40 is fair, 0.41–0.60 is moderate, 0.61–0.80 is substantial, and 0.81–1.00 is almost perfect agreement. A kappa of 1.0 means perfect agreement; 0 means agreement no better than chance.

What does the confusion matrix represent in this calculator?

The confusion matrix (also called a contingency table) holds counts of how often each rater assigned each category. Each cell [Row i, Col j] contains the number of subjects that Rater 1 placed in category i and Rater 2 placed in category j. Diagonal cells represent agreements; off-diagonal cells represent disagreements.

How many categories can I use with this Kappa calculator?

This calculator supports 2 to 5 rating categories. Select the number of categories in the setup dropdown, then fill in the corresponding cells of the confusion matrix. Cells outside your selected matrix size are ignored in the calculation.

What is the formula for Cohen's Kappa?

Kappa is calculated as κ = (Po − Pe) / (1 − Pe), where Po is the observed proportion of agreement (sum of diagonal cells divided by total N) and Pe is the expected agreement by chance (computed from the row and column marginal totals). The standard error is then used to construct the z-score and confidence interval.

What does a negative Kappa value mean?

A negative kappa indicates that the two raters agree less than would be expected by chance alone. This is rare in practice and often suggests a systematic disagreement — for example, one rater tends to consistently choose the opposite category from the other rater.

When should I use weighted Kappa instead of regular Kappa?

Weighted Kappa is preferred when the rating categories are ordered (ordinal), because it gives partial credit for near-misses. For example, if categories are 'None', 'Mild', 'Moderate', and 'Severe', disagreeing by one level should count less than disagreeing by three levels. For nominal (unordered) categories, use unweighted Kappa as provided here.

What sample size do I need for a reliable Kappa estimate?

There is no universal rule, but most researchers recommend at least 30–50 subjects per category for a stable Kappa estimate. Very small samples produce wide confidence intervals, making the kappa value unreliable. The p-value and 95% CI shown by this calculator will reflect the uncertainty caused by small sample sizes.

Results

Observed vs Chance Agreement

Results Table

More Statistics Tools

Frequently Asked Questions