Cluster Analysis Calculator

Enter your dataset as comma- or newline-separated numeric values, choose your number of clusters (k), and the Cluster Analysis Calculator runs k-means clustering to group similar observations together. You get the cluster assignments, cluster centers (centroids), within-cluster sum of squared errors (SSE), and a breakdown of how many points fall into each cluster — all visualized in a chart.

Results

Total Within-Cluster SSE

Clusters Found

Total Data Points

Largest Cluster Size

Smallest Cluster Size

Cluster Centroids (raw)

Results Table

More Statistics Tools

Frequently Asked Questions

What is k-means cluster analysis?

K-means clustering is an unsupervised machine learning algorithm that partitions a dataset into k groups (clusters) by minimizing the total within-cluster sum of squared distances from each point to its cluster centroid. Points in the same cluster are more similar to each other than to points in other clusters.

Why is it important to scale the data before clustering?

If your variables are measured on very different scales (e.g. age in years vs. income in thousands), variables with larger numeric ranges will dominate the distance calculations unfairly. Standardizing (z-score normalization) puts all variables on the same scale so each contributes equally to the clustering.

How do I choose the right number of clusters (k)?

A common approach is the elbow method — run k-means for different values of k and plot the total SSE. The 'elbow' point where SSE stops decreasing steeply suggests a good k. Domain knowledge and interpretability should also guide your choice.

Why might I get different cluster results each time I run the algorithm?

K-means initializes centroids randomly, so different runs can converge to different local minima. The 'Repeats' setting reruns the algorithm multiple times with different random starts and keeps the result with the lowest total SSE, making the solution more stable.

What is SSE (Sum of Squared Errors) in clustering?

SSE measures the total squared distance between each data point and the centroid of its assigned cluster. A lower SSE means the clusters are tighter and more compact. It is the main objective function that k-means minimizes.

What data format should I use as input?

Enter numeric values separated by commas or new lines. For example: 2, 5, 10, 11, 20. Non-numeric characters and empty cells are automatically ignored. This calculator works on a single numeric variable; for multi-variable clustering, separate x,y pairs per line.

What is a cluster centroid?

A centroid is the mean (average) of all data points assigned to a cluster. In each iteration, k-means recalculates centroids after reassigning points, repeating until assignments no longer change (convergence).

Can I use this calculator to detect outliers?

Yes. Points that are very far from their assigned centroid relative to the typical within-cluster spread may be considered outliers. A high SSE contribution from a single point is a strong signal that it doesn't fit well into any cluster.

Results

Number of Points per Cluster

Results Table

More Statistics Tools

Frequently Asked Questions