Cohen's Kappa Calculator

Enter your confusion matrix values into the Cohen's Kappa Calculator — fill in the observed agreement cells for two raters across 2 to 5 categories, and get back Cohen's Kappa (κ), observed agreement, expected agreement, and an interpretation of the inter-rater reliability level. Choose the number of categories, enter the cell counts, and the kappa statistic is computed for you.

Select how many classification categories both raters used.

Number of subjects both raters assigned to category 1.

Rater 1 chose category 1, Rater 2 chose category 2.

Rater 1 chose category 2, Rater 2 chose category 1.

Number of subjects both raters assigned to category 2.

Only used when 3+ categories are selected.

Results

Cohen's Kappa (κ)

--

Observed Agreement (Po)

--

Expected Agreement (Pe)

--

Total Observations (N)

--

Agreement Level

--

Observed vs. Expected Agreement

Results Table

Frequently Asked Questions

What is Cohen's Kappa?

Cohen's Kappa (κ) is a statistical measure that quantifies the level of agreement between two raters or classifiers, correcting for the agreement that would be expected purely by chance. A value of 1 indicates perfect agreement, 0 indicates agreement equal to chance, and negative values indicate less agreement than chance.

How is Cohen's Kappa calculated?

Kappa is calculated as κ = (Po − Pe) / (1 − Pe), where Po is the observed proportional agreement (sum of diagonal cells divided by total N) and Pe is the expected proportional agreement (calculated from the marginal totals of the confusion matrix). The formula adjusts raw agreement for the level of agreement that chance alone would predict.

How do I interpret a Cohen's Kappa value?

According to Landis & Koch (1977), κ < 0 indicates no agreement, 0–0.20 is slight, 0.21–0.40 is fair, 0.41–0.60 is moderate, 0.61–0.80 is substantial, and 0.81–1.00 is almost perfect agreement. Values above 0.6 are generally considered acceptable for research purposes, and values above 0.8 are considered strong.

What is the difference between observed agreement and Cohen's Kappa?

Observed agreement (Po) is simply the percentage of cases where both raters agree, without any correction. Cohen's Kappa goes further by subtracting the agreement that would be expected by chance (Pe), making it a more conservative and informative measure — especially when category prevalences are unequal.

When should I use weighted Kappa instead of regular Kappa?

Weighted Kappa should be used when your categories are on an ordinal scale, meaning that some disagreements are worse than others (e.g., rating something '1' vs '5' is worse than rating it '1' vs '2'). Regular Cohen's Kappa treats all disagreements equally and is appropriate for nominal (unordered) categories.

Can Cohen's Kappa be negative?

Yes. A negative kappa means the two raters agreed less often than would be expected purely by chance. This is an unusual situation and often signals systematic disagreement, a flawed rating system, or a data entry error. In practice, negative kappa values are rare with well-designed rating studies.

How many categories can Cohen's Kappa handle?

Cohen's Kappa can be computed for any number of categories, as long as both raters use the same set. This calculator supports 2 to 5 categories. The confusion matrix grows with the number of categories — for k categories you need a k×k matrix of counts.

What is a good sample size for computing Cohen's Kappa?

There is no universal minimum, but larger samples yield more stable and reliable kappa estimates. A commonly cited guideline is at least 30–50 subjects per category. Very small samples can produce misleading kappa values, especially when one category is rare.

More Statistics Tools