54

Here is what I am doing:

$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>> statsmodels.__version__
'0.5.0'
>>> import numpy 
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([ 1.82352941])

I had expected an array with two elements?!? The intercept and the slope coefficient?

Tom
  • 2,769
  • 2
  • 17
  • 22
  • 3
    [Docs](http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html): An interecept is not included by default and should be added by the user. See statsmodels.tools.add_constant. – alko Dec 20 '13 at 10:33
  • 4
    What is the significance of add_constant() here. When I generate a model in linear reg., I would expect to have an intercept, y = mX + C. What's the intention to have someone do additional operation of adding constant on top of input vector. – Abhi Sep 18 '16 at 22:57
  • Interestingly, if you use the R-like formula api in statsmodels that gives you the intercept by default. – MJMacarty Jan 06 '19 at 00:43

6 Answers6

85

Try this:

X = sm.add_constant(X)
sm.OLS(y,X)

as in the documentation:

An intercept is not included by default and should be added by the user

statsmodels.tools.tools.add_constant

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
  • 34
    I am quite puzzled by this. Why isn't an intercept added by default? Why do you want to run the linear regression *without* the bloody constant? It makes no sense to me. – FaCoffee Oct 16 '17 at 18:24
  • what does adding a column of ones to an array do to X? – Golden Lion Jan 31 '22 at 21:20
10

Just to be complete, this works:

>>> import numpy 
>>> import statsmodels.api as sm
>>> y = numpy.array([1,2,3,4,5,6,7,8,9])
>>> X = numpy.array([1,1,2,2,3,3,4,4,5])
>>> X = sm.add_constant(X)
>>> res_ols = sm.OLS(y, X).fit()
>>> res_ols.params
array([-0.35714286,  1.92857143])

It does give me a different slope coefficient, but I guess that figures as we now do have an intercept.

Community
  • 1
  • 1
Tom
  • 2,769
  • 2
  • 17
  • 22
4

Try this, it worked for me:

import statsmodels.formula.api as sm

from statsmodels.api import add_constant

X_train = add_constant(X_train)

X_test = add_constant(X_test)


model = sm.OLS(y_train,X_train)

results = model.fit()

y_pred=results.predict(X_test)

results.params
Vishesh Shrivastav
  • 2,079
  • 2
  • 16
  • 34
sup
  • 41
  • 1
  • 1
    use `import statsmodels.api as sm` instead. `formula.api` will not have `OLS` (capital case) in the next release, only `ols` (lower case for formula interface) – Josef Oct 05 '18 at 19:14
2

I'm running 0.6.1 and it looks like the "add_constant" function has been moved into the statsmodels.tools module. Here's what I ran that worked:

res_ols = sm.OLS(y, statsmodels.tools.add_constant(X)).fit()
1

i did add the code X = sm.add_constant(X) but python did not return the intercept value so using a little algebra i decided to do it myself in code:

this code computes regression over 35 samples, 7 features plus one intercept value that i added as feature to the equation:

import statsmodels.api as sm
from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd

x=np.empty((35,8)) # (numSamples, oneIntercept + numFeatures))
feature_names = np.empty((8,))
y = np.empty((35,))

dbfv = open("dataset.csv").readlines()


interceptConstant = 1;
i = 0
# reading data and writing in numpy arrays
while i<len(dbfv):
    cells = dbfv[i].split(",")
    j = 0
    x[i][j] = interceptConstant
    feature_names[j] = str(j)
    while j<len(cells)-1:
        x[i][j+1] = cells[j]
        feature_names[j+1] = str(j+1)
        j += 1
    y[i] = cells[len(cells)-1]
    i += 1
# creating dataframes
df = pd.DataFrame(x, columns=feature_names)

target = pd.DataFrame(y, columns=["TARGET"])

X = df
y = target["TARGET"]

model = sm.OLS(y, X).fit()

print(model.params)

# predictions = model.predict(X) # make the predictions by the model


# Print out the statistics
print(model.summary())
R.jzadeh
  • 187
  • 2
  • 12
0

Try this

X = sm.add_constant(X)
ols= sm.OLS(y,X)
res_ols= ols.fit()
res_ols.params
res_ols.params[0]
res_ols.params[1]
print(res_ols.summary())
Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
kyramichel
  • 471
  • 5
  • 4