Tetrachoric Correlation Calculator

Enter the four cells of your 2×2 contingency table — cell a (both present), cell b (first present, second absent), cell c (first absent, second present), and cell d (both absent) — and the Tetrachoric Correlation Calculator estimates the underlying correlation coefficient (r_tet) between two latent continuous variables. You also get the standard error, 95% confidence interval, and a chi-square test of independence to assess statistical significance.

Results

Tetrachoric Correlation (r_tet)

Standard Error (SE)

CI Lower Bound

CI Upper Bound

Chi-Square (χ²)

P-Value

Total N

Interpretation

Results Table

More Statistics Tools

Frequently Asked Questions

What is a tetrachoric correlation?

The tetrachoric correlation estimates the Pearson correlation between two latent (underlying) continuous variables that have each been dichotomized into binary (0/1) outcomes. It is commonly used in psychometrics, epidemiology, and genetics when you observe two dichotomous variables but assume they reflect underlying continuous traits.

When should I use tetrachoric correlation instead of Pearson or phi?

Use tetrachoric correlation when both observed variables are dichotomous but you believe they represent continuous underlying constructs — for example, pass/fail scores on two tests, or presence/absence of two symptoms. The phi coefficient is appropriate when the dichotomy is truly categorical, while tetrachoric is better when the dichotomy is artificial (e.g., a threshold on a continuous scale).

What do cells A, B, C, and D represent in the 2×2 table?

Cell A is the count where both variables equal 1, Cell B is where Variable 1 = 1 and Variable 2 = 0, Cell C is where Variable 1 = 0 and Variable 2 = 1, and Cell D is where both variables equal 0. Together, A + B + C + D equals your total sample size N.

How is the tetrachoric correlation calculated?

The calculator uses the cosine-pi approximation: r_tet ≈ cos(π / (1 + √(AD/BC))). For more precision, iterative numerical methods based on the bivariate normal distribution are also used. This tool applies the widely accepted approximation formula, which is accurate for most practical use cases.

What is the standard error of the tetrachoric correlation?

The standard error (SE) quantifies the uncertainty in your r_tet estimate due to sample size. Larger samples yield smaller SEs. It is used to compute the confidence interval and assess how precisely the true tetrachoric correlation is estimated from your data.

How do I interpret the p-value in this calculator?

The p-value comes from a chi-square test of independence applied to the 2×2 table. A p-value below your chosen alpha level (commonly 0.05) indicates that the two binary variables are not independent — supporting the conclusion that a genuine correlation exists between the underlying continuous traits.

What sample size do I need for reliable tetrachoric correlation estimates?

A minimum of 50–100 observations is generally recommended, but larger samples (200+) produce more stable estimates. Very small cell counts (below 5 in any cell) can lead to unstable or inaccurate results, so verify that all four cells have sufficient counts before interpreting r_tet.

Can tetrachoric correlation be used in factor analysis?

Yes — tetrachoric correlations are commonly used to build polychoric correlation matrices for factor analysis when items are dichotomous, such as in educational testing or psychological questionnaires. Software like R's 'psych' package or LISREL uses these matrices to perform factor analysis appropriate for binary item data.

Results

2×2 Table Cell Distribution

Results Table

More Statistics Tools

Frequently Asked Questions