Intraclass Correlation Calculator

Enter ratings from multiple raters — paste your data as rows of space- or tab-separated values (one subject per row, one rater per column) — and this Intraclass Correlation Calculator computes the ICC coefficient, interprets reliability, and breaks down the ANOVA table components including between-subjects variance, within-subjects variance, and residual error. Choose your ICC model (One-Way or Two-Way) and type (Consistency or Absolute Agreement) to match your study design. You get the ICC value, a reliability interpretation, degrees of freedom, mean squares, and F-statistic.

One-Way: each subject rated by different raters. Two-Way: same raters rate all subjects.

Enter one subject per row. Separate rater values with spaces or tabs. All rows must have the same number of values.

Results

ICC Coefficient

--

Reliability Interpretation

--

MS Between Subjects (MSR)

--

MS Between Raters (MSC)

--

MS Residual Error (MSE)

--

F-Statistic (Between Subjects)

--

Number of Subjects (n)

--

Number of Raters (k)

--

ANOVA Variance Components

Results Table

Frequently Asked Questions

What is the Intraclass Correlation Coefficient (ICC)?

The ICC is a statistical measure of reliability and agreement for continuous measurements made by multiple raters or on multiple occasions. It ranges from 0 (no reliability) to 1 (perfect reliability), and can theoretically be negative. It is widely used in clinical research, psychology, and quality assessment to evaluate inter-rater reliability.

What is the difference between ICC consistency and absolute agreement?

Consistency ICC ignores systematic differences between raters — it measures whether raters rank subjects in the same order. Absolute Agreement ICC accounts for systematic biases, asking whether raters produce the same actual numeric values. Absolute agreement is the stricter criterion and is preferred when the magnitude of ratings matters, not just the ranking.

When should I use a One-Way vs Two-Way ICC model?

Use the One-Way Random model (ICC 1,1) when each subject is rated by a different, randomly selected set of raters from a larger pool, and those specific raters are not of interest. Use the Two-Way model when the same set of raters rates all subjects — Mixed effects (ICC 3,1) when raters are the only raters of interest, or Random effects (ICC 2,1) when raters are a random sample of a larger pool.

How do I interpret ICC values?

Common benchmarks: ICC < 0.50 indicates poor reliability; 0.50–0.75 indicates moderate reliability; 0.75–0.90 indicates good reliability; ICC > 0.90 indicates excellent reliability. These thresholds should be interpreted in the context of your field — clinical measurement tools often require ICC > 0.90, while exploratory research may accept lower values.

How is the ICC calculated?

ICC is calculated using a two-way Analysis of Variance (ANOVA). The total variance is partitioned into between-subjects variance (MSR), between-raters variance (MSC), and residual error (MSE). The ICC formula combines these mean squares — for example, ICC(2,1) = (MSR − MSE) / (MSR + (k−1)·MSE + (k/n)·(MSC − MSE)), where k is the number of raters and n is the number of subjects.

How should I format data for this calculator?

Enter your data as a plain text table where each row represents one subject and each column represents one rater. Separate values with spaces or tabs. All rows must contain the same number of values (i.e., the same number of raters must rate every subject). A minimum of 2 raters and 3 subjects is recommended for meaningful results.

Can ICC values be negative?

Yes, ICC can theoretically be negative, which occurs when the variance within subjects is greater than the variance between subjects. This typically indicates that there is no agreement between raters — raters are essentially assigning values at random relative to the true subject differences. In practice, a negative ICC should prompt you to check your data for errors.

What is the difference between ICC and Pearson correlation?

Pearson correlation measures the linear relationship between two variables but is insensitive to systematic bias between raters. ICC measures actual agreement — two raters can have a Pearson correlation of 1.0 yet a low ICC if one consistently rates 10 points higher than the other. ICC also generalizes to more than two raters, while Pearson is limited to pairwise comparisons.

More Statistics Tools