1

I have the following R code with binomial regression to fit the y and polynomial of x

res = glm(df.mat ~ poly(x, deg=degree), family=binomial(link="logit"))

and the result is

R code result

However, when I use the statsmodels GLM function in Python as

import statsmodels.api as sm

def poly(x, d):
    x = np.array(x)
    X = np.transpose(np.vstack((x**k for k in range(d+1))))
    return np.linalg.qr(X)[0][:,1:]

res_m = sm.GLM(np.array(df_mat), poly(x, 7), family = sm.families.Binomial())
result = res_m.fit()

with the polynomial function created by Python equivalent to R poly() function? then the result is

Python result

merv
  • 67,214
  • 13
  • 180
  • 245
Jun Hou
  • 13
  • 3
  • 1
    Either the difference arises in the `poly` calculation or the GLM fitting (or both). Have you compared the result of R's `poly(x, deg = degree)` with Python's `poly(x, 7)` to see if the issue is there? If you need more help than that, you should probably post some sample data. – Gregor Thomas Oct 06 '20 at 16:06
  • 1
    keeping zero power in polynomial would keep the constant. `return np.linalg.qr(X)[0]` – Josef Oct 07 '20 at 17:45

1 Answers1

2

According to the documentation, the statsmodels GLM does not automatically include an intercept term, which the R model does. Try including a column of ones in the matrix you pass as exog to the sm.GLM. The documentation notes a convenience function exists for this very purpose: statsmodels.tools.add_constant.

merv
  • 67,214
  • 13
  • 180
  • 245