Least Squares Regression Calculator

Enter your X and Y data pairs into the Least Squares Regression Calculator to find the line of best fit. Input your data points as comma-separated values and get back the regression equation (Y = a + bX), Pearson correlation coefficient (r), coefficient of determination (R²), and a table of predicted values and residuals.

Enter X values separated by commas, spaces, or new lines.

Enter Y values separated by commas, spaces, or new lines. Must match the count of X values.

Enter an X value to predict its corresponding Y using the regression equation.

Check this to force the regression line through the origin.

Results

Regression Equation

--

Slope (b)

--

Y-Intercept (a)

--

Pearson Correlation (r)

--

Coefficient of Determination (R²)

--

Predicted Y (at given X)

--

Number of Data Points (n)

--

Scatter Plot with Regression Line

Results Table

Frequently Asked Questions

What is least squares regression?

Least squares regression is a statistical method for finding the straight line that best fits a set of data points. It works by minimizing the sum of the squared differences (residuals) between each observed Y value and the Y value predicted by the line. The result is the 'line of best fit' described by the equation Y = a + bX.

What is 'ordinary least squares' (OLS)?

Ordinary least squares (OLS) is the most common technique used to estimate the coefficients of a linear regression model. It finds values for the slope (b) and intercept (a) that minimize the sum of squared residuals. OLS is unbiased and efficient under standard regression assumptions, making it the default method for simple linear regression.

How do I calculate the regression equation Y = a + bX?

The slope b is calculated as b = [n·Σ(XY) − ΣX·ΣY] / [n·Σ(X²) − (ΣX)²], and the intercept a is calculated as a = (ΣY − b·ΣX) / n. Once you have a and b, you can plug any X value into Y = a + bX to predict the corresponding Y. This calculator performs all these steps automatically.

What does the Pearson correlation coefficient (r) tell me?

The Pearson correlation coefficient r measures the strength and direction of the linear relationship between X and Y. It ranges from −1 to +1. A value near +1 indicates a strong positive relationship, near −1 indicates a strong negative relationship, and near 0 suggests little to no linear correlation.

What is R² (coefficient of determination) and how is it interpreted?

R² is the square of the Pearson correlation coefficient and represents the proportion of variance in Y that is explained by X through the regression line. For example, an R² of 0.85 means that 85% of the variation in Y is accounted for by X. Higher R² values generally indicate a better-fitting model.

What are residuals and why do they matter?

A residual is the difference between an observed Y value and the Y value predicted by the regression line (Residual = Y − Ŷ). Residuals help you assess how well the model fits the data. Ideally, residuals should be small, randomly distributed, and show no obvious pattern — large or patterned residuals suggest the linear model may not be appropriate.

When should I force the Y-intercept to zero?

You should force the Y-intercept to zero (b₀ = 0) only when it is theoretically certain that Y must equal zero when X equals zero. For example, if you are modeling distance traveled at time zero. In most practical cases, it is better to let the intercept be estimated freely from the data to avoid bias.

How many data points do I need for linear regression?

You need at least 2 data points to fit a line, but more data produces more reliable and statistically meaningful results. A general rule of thumb is to have at least 10 to 20 observations for a stable simple linear regression. With very few points, the regression line may fit the sample data well but generalize poorly to new data.

More Math Tools