Logistic Regression Calculator

Enter your binary outcome data (0s and 1s) and a corresponding predictor variable (X values) to fit a logistic regression model. This calculator estimates the intercept (β₀) and coefficient (β₁), computes the log-likelihood, and plots the S-curve probability model — P = 1 / (1 + e^−(β₀ + β₁x)). Paste your data below, one value per line, to see your model results.

Enter the independent variable values, one per line. These can be continuous or binary.

Enter binary outcome values (0 = failure, 1 = success), one per line. Must match the number of X values.

Optional: enter a specific X value to compute the predicted probability.

Results

Predicted P(Y=1) at X

--

Intercept (β₀)

--

Coefficient (β₁)

--

Log-Likelihood

--

Null Deviance

--

Residual Deviance

--

AIC

--

Number of Observations

--

Successes (Y=1)

--

Failures (Y=0)

--

Logistic Regression S-Curve

Results Table

Frequently Asked Questions

What is logistic regression and when should I use it?

Logistic regression is a statistical method used to model the probability of a binary outcome (0 or 1) based on one or more predictor variables. You should use it when your dependent variable has exactly two categories — such as pass/fail, disease/no disease, or purchase/no purchase — and you want to understand how predictors influence the likelihood of each outcome.

What do the β₀ and β₁ coefficients mean?

β₀ (the intercept) represents the log-odds of the outcome when the predictor X equals zero. β₁ (the slope coefficient) represents the change in log-odds for each one-unit increase in X. A positive β₁ means the probability of Y=1 increases as X increases; a negative β₁ means the opposite.

How is the logistic regression probability formula calculated?

The model estimates P(Y=1) using the formula P = 1 / (1 + e^−(β₀ + β₁·x)). This S-shaped (sigmoid) curve maps any real-valued linear combination of predictors to a probability between 0 and 1. The coefficients are estimated using maximum likelihood estimation (MLE), which iteratively finds the β values that make the observed data most probable.

What is log-likelihood and why does it matter?

Log-likelihood measures how well the fitted model explains the observed data. A higher (less negative) log-likelihood indicates a better fit. It is used to compute other diagnostics like the residual deviance (−2 × log-likelihood) and AIC, which help compare models and assess goodness of fit.

What is the difference between null deviance and residual deviance?

Null deviance measures how well a model with no predictors (only an intercept) fits the data. Residual deviance measures how well your fitted model (with predictors) fits. A large reduction from null to residual deviance indicates your predictor significantly improves the model. Ideally, residual deviance should be much lower than null deviance.

What is AIC and how do I use it to compare models?

AIC (Akaike Information Criterion) balances model fit against complexity. It is calculated as AIC = −2 × log-likelihood + 2 × number of parameters. Lower AIC values indicate a better model. AIC is most useful when comparing two or more competing models on the same dataset — choose the model with the lowest AIC.

How many data points do I need for logistic regression?

A common rule of thumb is to have at least 10 events (occurrences of the less frequent outcome) per predictor variable. For a single predictor model, this means at least 10 cases of both Y=0 and Y=1. Small samples can lead to unstable coefficient estimates and overfitting, so larger datasets generally yield more reliable results.

What does the predicted probability output mean?

When you enter a specific X value in the 'Predict Probability' field, the calculator uses your fitted model to estimate P(Y=1) at that value. For example, if X represents patient age and Y represents disease presence, entering X=50 gives the estimated probability that a 50-year-old has the disease, based on your data.

More Statistics Tools