Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
241
votes
10 answers

Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y)
elplatt
  • 3,227
  • 3
  • 18
  • 20
185
votes
6 answers

Adding a regression line on a ggplot

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this... data =…
Remi.b
  • 17,389
  • 28
  • 87
  • 168
160
votes
6 answers

How to force R to use a specified factor level as reference in a regression?

How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression? It's just using some level by default. lm(x ~ y + as.factor(b)) with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that…
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
133
votes
6 answers

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40,…
Michael
  • 13,244
  • 23
  • 67
  • 115
133
votes
10 answers

Linear Regression and group by in R

I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm…
JD Long
  • 59,675
  • 58
  • 202
  • 294
86
votes
4 answers

Extract regression coefficient values

I have a regression model for some time series data investigating drug utilisation. The purpose is to fit a spline to a time series and work out 95% CI etc. The model goes as follows: id <- ts(1:length(drug$Date)) a1 <- ts(drug$Rate) a2 <-…
John
  • 41,131
  • 31
  • 82
  • 106
77
votes
4 answers

Linear regression analysis with string/categorical features (variables)?

Regression algorithms seem to be working on features represented as numbers. For example: This data set doesn't contain categorical features/variables. It's quite clear how to do regression on this data and predict price. But now I want to do a…
68
votes
3 answers

scikit-learn cross validation, negative values with mean squared error

When I use the following code with Data matrix X of size (952,144) and output vector y of size (952), mean_squared_error metric returns negative values, which is unexpected. Do you have any idea? from sklearn.svm import SVR from sklearn import…
ahmethungari
  • 2,089
  • 4
  • 19
  • 21
64
votes
5 answers

Screening (multi)collinearity in a regression model

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure"…
aL3xa
  • 35,415
  • 18
  • 79
  • 112
62
votes
3 answers

How to debug "contrasts can be applied only to factors with 2 or more levels" error?

Here are all the variables I'm working with: str(ad.train) $ Date : Factor w/ 427 levels "2012-03-24","2012-03-29",..: 4 7 12 14 19 21 24 29 31 34 ... $ Team : Factor w/ 18 levels "Adelaide","Brisbane Lions",..: 1 1 1…
Troy
  • 683
  • 1
  • 7
  • 8
59
votes
3 answers

fitting data with numpy

I have the following data: >>> x array([ 3.08, 3.1 , 3.12, 3.14, 3.16, 3.18, 3.2 , 3.22, 3.24, 3.26, 3.28, 3.3 , 3.32, 3.34, 3.36, 3.38, 3.4 , 3.42, 3.44, 3.46, 3.48, 3.5 , 3.52, 3.54, 3.56, 3.58, 3.6 , 3.62, …
ezitoc
  • 729
  • 1
  • 6
  • 9
58
votes
2 answers

Display regression equation in seaborn regplot

Does anyone know how to display the regression equation in seaborn using sns.regplot or sns.jointplot? regplot doesn't seem to have any parameter that you can be pass to display regression diagnostics, and jointplot only displays the pearson R^2,…
Vikram Josyula
  • 1,373
  • 4
  • 12
  • 15
56
votes
8 answers

Java-R integration?

I have a Java app which needs to perform partial least squares regression. It would appear there are no Java implementations of PLSR out there. Weka might have had something like it at some point, but it is no longer in the API. On the other hand, I…
mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
56
votes
3 answers

extracting standardized coefficients from lm in R

My apologies for the dumb question...but I can't seem to find a simple solution I want to extract the standardized coefficients from a fitted linear model (in R) there must be a simple way or function that does that. can you tell me what is it? EDIT…
amit
  • 3,332
  • 6
  • 24
  • 32
56
votes
3 answers

Quadratic and cubic regression in Excel

I have the following information: Height Weight 170 65 167 55 189 85 175 70 166 55 174 55 169 69 170 58 184 84 161 56 170 75 182 68 167 51 …
user466534
1
2 3
99 100