Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
313
votes
10 answers

Add regression line equation and R^2 on graph

I wonder how to add regression line equation and R^2 on the ggplot. My code is: library(ggplot2) df <- data.frame(x = c(1:100)) df$y <- 2 + 3 * df$x + rnorm(100, sd = 40) p <- ggplot(data = df, aes(x = x, y = y)) + geom_smooth(method =…
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
279
votes
15 answers

What is the difference between linear regression and logistic regression?

When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values. Then, what is the difference between the two…
London guy
  • 27,522
  • 44
  • 121
  • 179
230
votes
7 answers

How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting

I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic). I use Python and Numpy and for polynomial fitting there is a function polyfit(). But I found no such functions for…
Tomas Novotny
  • 7,547
  • 9
  • 26
  • 23
185
votes
6 answers

Adding a regression line on a ggplot

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this... data =…
Remi.b
  • 17,389
  • 28
  • 87
  • 168
160
votes
6 answers

How to force R to use a specified factor level as reference in a regression?

How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression? It's just using some level by default. lm(x ~ y + as.factor(b)) with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that…
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
153
votes
15 answers

Multiple linear regression in Python

I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.). For example, with this…
Zach
  • 4,624
  • 13
  • 43
  • 60
133
votes
10 answers

Linear Regression and group by in R

I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm…
JD Long
  • 59,675
  • 58
  • 202
  • 294
113
votes
8 answers

Linear regression with matplotlib / numpy

I'm trying to generate a linear regression on a scatter plot I have generated, however my data is in list format, and all of the examples I can find of using polyfit require using arange. arange doesn't accept lists though. I have searched high and…
user771224
111
votes
8 answers

Accuracy Score ValueError: Can't Handle mix of binary and continuous target

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data : array([1, 1, 0, 0, 0, 0, 1, 1, 0,…
82
votes
8 answers

How to overplot a line on a scatter plot in python?

I have two vectors of data and I've put them into pyplot.scatter(). Now I'd like to over plot a linear fit to these data. How would I do this? I've tried using scikitlearn and np.polyfit().
goldisfine
  • 4,742
  • 11
  • 59
  • 83
77
votes
4 answers

Linear regression analysis with string/categorical features (variables)?

Regression algorithms seem to be working on features represented as numbers. For example: This data set doesn't contain categorical features/variables. It's quite clear how to do regression on this data and predict price. But now I want to do a…
75
votes
4 answers

why gradient descent when we can solve linear regression analytically

what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so why we still want to use gradient descent to do the same…
John
  • 2,107
  • 3
  • 22
  • 39
74
votes
6 answers

How to get a regression summary in scikit-learn like R does?

As an R user, I wanted to also get up to speed on scikit. Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output. Code example: # Linear Regression import numpy as…
mpg
  • 3,679
  • 8
  • 36
  • 45
65
votes
5 answers

gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) …
59
votes
1 answer

How to calculate the 95% confidence interval for the slope in a linear regression model in R

Here is an exercise from Introductory Statistics with R: With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body…
Yu Fu
  • 1,151
  • 1
  • 8
  • 15
1
2 3
99 100