Lasso Regression Calculator

Enter your X (independent) and Y (dependent) data points along with a lambda (λ) penalty strength to perform Lasso Regression. The calculator fits an L1-regularized linear model, returning the intercept, coefficients, R-squared, and residual sum of squares — with some coefficients shrunk to exactly zero for automatic feature selection.

Enter the dependent (response) variable values separated by commas or spaces.

Enter one set of independent (predictor) variable values separated by commas or spaces.

L1 regularization strength. Higher λ shrinks more coefficients toward zero.

Maximum iterations for coordinate descent convergence.

Step size for coordinate descent. Smaller values are more stable but slower.

Results

R-Squared (R²)

--

Intercept (β₀)

--

Coefficient (β₁)

--

Residual Sum of Squares (RSS)

--

Mean Squared Error (MSE)

--

Root MSE (RMSE)

--

Data Points (n)

--

Actual vs Fitted Values

Results Table

Frequently Asked Questions

What is Lasso Regression?

Lasso (Least Absolute Shrinkage and Selection Operator) regression is a type of linear regression that adds an L1 penalty — the sum of the absolute values of the coefficients — to the loss function. This penalty shrinks less important coefficients exactly to zero, effectively performing automatic feature selection while fitting the model.

What is Regularized Regression (Ridge, Lasso & Elastic Net)?

Regularized regression adds a penalty term to the ordinary least squares objective to prevent overfitting. Ridge regression uses an L2 penalty (sum of squared coefficients), Lasso uses an L1 penalty (sum of absolute coefficients), and Elastic Net combines both. Lasso is unique in that it can set coefficients exactly to zero, making it ideal for sparse models with many irrelevant predictors.

What does the lambda (λ) parameter control in Lasso?

Lambda (λ) controls the strength of the regularization penalty. A λ of 0 reduces Lasso to ordinary linear regression. As λ increases, more coefficients are shrunk toward zero — and eventually forced to exactly zero. Choosing the right λ typically involves cross-validation to balance model fit and sparsity.

How does Lasso differ from Ridge Regression?

Both Lasso and Ridge add a penalty to prevent overfitting, but they differ in the type of penalty. Ridge uses L2 (squared coefficients) and shrinks all coefficients toward zero without setting any to exactly zero. Lasso uses L1 (absolute coefficients) and can drive coefficients to exactly zero, making it a built-in feature selector. Lasso is preferred when you suspect only a few predictors are truly important.

Why does Lasso produce sparse models?

The L1 penalty creates a diamond-shaped constraint region in coefficient space. Because the corners of this region lie on the axes, the optimal solution frequently lands at a corner where one or more coefficients equal zero. This geometric property is what allows Lasso to eliminate irrelevant features entirely, unlike Ridge's circular constraint which only shrinks — never eliminates.

What is R-squared in the context of Lasso Regression?

R-squared measures the proportion of variance in the dependent variable explained by the model, ranging from 0 to 1. In Lasso regression, a regularized model may have a slightly lower R-squared than an unregularized OLS model, but it tends to generalize better to new data by avoiding overfitting. An R-squared of 1 means a perfect fit; 0 means the model explains no variance.

What is the Residual Sum of Squares (RSS) used for?

RSS is the sum of the squared differences between observed Y values and the model's fitted Y values. Lasso minimizes RSS plus the L1 penalty term jointly. A lower RSS indicates a better in-sample fit, but minimizing RSS alone (without the penalty) can overfit the data — which is exactly what Lasso's regularization is designed to prevent.

When should I use Lasso instead of standard linear regression?

Use Lasso when you have many predictor variables and suspect that only a subset truly influence the outcome — Lasso will automatically zero out irrelevant ones. It is also preferred when multicollinearity is present or when you want an interpretable, sparse model. Standard OLS is fine for low-dimensional problems where all predictors are expected to matter and overfitting is not a concern.

More Statistics Tools