2

Hi I have 2 lists of numbers and I want to get the R^2 from a regular linear regression. I think the question has been posted a lot, but I just cannot find this somewhere.

My lists:

my_y = [2,5,6,10]
my_x = [19,23,22,30]

I have tried to change it to numpy arrays and then use sklearn to regress and get what I need, but I did not succeed. I used the following code:

from sklearn.linear_model import LinearRegression
import numpy as np

my_y = np.array([2,5,6,10]).reshape(1, -1)
my_x = np.array([19,23,22,30]).reshape(1,-1)

lm = LinearRegression()
result = lm.score(my_x, my_y)
print(result)

Does anyone have a fast way to get the R^2 from doing a linear regression between those 2 variables?

My expected output from this regression is: R^2=0.930241

Adrian
  • 774
  • 7
  • 26
  • Does this answer your question? [Run an OLS regression with Pandas Data Frame](https://stackoverflow.com/questions/19991445/run-an-ols-regression-with-pandas-data-frame) – vestland Mar 27 '20 at 12:01

2 Answers2

7

Try:

import scipy

my_y = [2,5,6,10]
my_x = [19,23,22,30]

slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(my_x, my_y)
    
print(r_value**2)

and you get:

0.9302407516147975

From scipy version '1.4.1' (thanks to @FlamePrinz for having noted the issue for new versions of scipy):

from scipy import stats

my_y = [2,5,6,10]
my_x = [19,23,22,30]

slope, intercept, r_value, p_value, std_err = stats.linregress(my_x, my_y)

print(r_value**2)
sentence
  • 8,213
  • 4
  • 31
  • 40
  • 1
    This could cause the error: `AttributeError: module 'scipy' has no attribute 'stats'`. You would want to instead do: `from scipy import stats`, and then call it with: `slope, intercept, r_value, p_value, std_err = stats.linregress(my_x, my_y)` – FlamePrinz Jul 21 '20 at 21:06
0

From a quick look of the documentation, I see that linear_model needs you to provide a linear model as the name suggests. To get a simple R:

import scipy
my_y = np.array([2,5,6,10])
my_x = np.array([19,23,22,30])
R=scipy.stats.linregress(my_x, my_y)[2]
print(R)

0.9644898919194527

and R**2 yields the desired result of 0.930.

George
  • 451
  • 1
  • 6
  • 17
  • Hi. Hmmm no? I was expecting the result to be equal to 0.930241. To get this result I used Excel's Regression tool and I should be able to get the same result with python as well... – Adrian Apr 16 '19 at 17:55
  • 2
    Ok let me check. – George Apr 16 '19 at 17:57
  • 1
    Yeah. That is weird. Honestly, I do not know why is that. I don't work much with scipy, but It seems it works if I implement the other solution from sentence. I would look through the documentation, but I have to present some results in a few hours, and I do not have time. I will come back to this some other time and dig into it! – Adrian Apr 16 '19 at 18:09
  • @George the OP asked for R-*squared*, not R. – James Phillips Apr 16 '19 at 18:29
  • 1
    @Adrian I went through the docs and as the name `sklearn.linear_model` suggests, a linear model needs to be provided, for the program to 'learn' and then put out a result. That's why it didn't work correctly. – George Apr 16 '19 at 18:38
  • @George your edit to square the value for R is now giving the correct result. – James Phillips Apr 16 '19 at 22:32