Principal Component Analysis Calculator

The Principal Component Analysis Calculator reduces high-dimensional data to its most informative components. Enter your data matrix (rows = observations, columns = variables) as comma- or tab-separated values, choose your analysis method (correlation or covariance), and select the number of components to extract. You get back eigenvalues, explained variance percentages, cumulative variance, and a component loading summary — perfect for exploratory data analysis and dimensionality reduction.

Each row is one observation. Each column is one variable. Numeric values only.

Use Correlation when variables are on different scales. Use Covariance when variables share the same scale.

How many principal components to display in the output.

Results

Variance Explained by PC1

--

Eigenvalue PC1

--

Eigenvalue PC2

--

Variance Explained by PC2

--

Cumulative Variance (PC1+PC2)

--

Total Variables (Max Components)

--

Explained Variance per Principal Component (Scree Plot)

Results Table

Frequently Asked Questions

What is Principal Component Analysis (PCA)?

PCA is a statistical technique that transforms a dataset with many correlated variables into a smaller set of uncorrelated variables called principal components. These components capture the maximum variance in the data, ordered from most to least important. It is widely used for dimensionality reduction, visualization, and noise filtering in data science and machine learning.

What is an eigenvalue in PCA?

An eigenvalue represents the amount of variance captured by its corresponding principal component. A larger eigenvalue means that component explains more of the total variance in the dataset. A common rule of thumb (Kaiser criterion) is to retain components with eigenvalues greater than 1 when using the correlation matrix.

What is a scree plot?

A scree plot is a bar or line chart that displays eigenvalues (or variance explained) for each principal component in descending order. It helps you decide how many components to retain by looking for an 'elbow' — the point where adding more components yields diminishing returns in explained variance.

What is the difference between Correlation and Covariance PCA methods?

When you use the Correlation method, variables are standardized to have mean 0 and standard deviation 1 before PCA is applied. This is appropriate when your variables are measured in different units or on very different scales. The Covariance method only centers variables (subtracts the mean) and is suitable when all variables are on the same scale, since variables with larger variances will dominate the components.

How many principal components should I keep?

There are several common criteria: the Kaiser criterion (keep components with eigenvalue > 1), the scree plot elbow rule, or a cumulative variance threshold (e.g., retain enough components to explain 80–95% of total variance). The right number depends on your goals — visualization typically uses 2–3 components, while preprocessing for machine learning may require more.

What are component loadings?

Component loadings are the correlations (or coefficients) between each original variable and a principal component. Large absolute loadings indicate that a variable contributes strongly to that component. Loadings help you interpret what each principal component represents in terms of the original variables.

What is a PCA biplot?

A PCA biplot combines a scatter plot of observations (scores) in principal component space with arrows representing the original variables (loadings). The direction and length of each arrow indicate how strongly and in which direction that variable contributes to the two displayed components. Biplots are useful for simultaneously visualizing structure among observations and relationships among variables.

Can PCA be used for datasets with missing data?

Standard PCA requires complete data. Common approaches for handling missing values include listwise deletion (removing rows with any missing value) and mean imputation (replacing missing values with the column mean). For datasets with substantial missing data, more advanced methods such as expectation-maximization PCA may be preferable, but these are beyond the scope of a basic online calculator.

More Statistics Tools