I get completely different results with the same datasets in R and Python. I cannot understand why it happens.
R:
library(RcppCNPy)
d <- npyLoad("/home/vvkovalchuk/bin/src/python/asks1.npy")
datas = npyLoad('/home/vvkovalchuk/bin/src/python/bids2.npy')
m <- lm(d ~ datas)
summary(m)
Python:
import time
import numpy
import statsmodels.api as sm
from math import log
Y = numpy.load('./asks1.npy', allow_pickle=True)
X = numpy.load('./bids2.npy', allow_pickle=True)
X3 = sm.add_constant(X)
res_ols = sm.OLS(Y, X3).fit()
print(res_ols.params)
What am I doing wrong?
Results:
R:
Call:
lm(formula = d ~ datas)
Residuals:
Min 1Q Median 3Q Max
-6.089e+06 8.797e+07 2.163e+08 2.179e+08 1.122e+10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.561e+00 2.253e+06 0 1
datas 3.809e+03 2.164e+09 0 1
Residual standard error: 208100000 on 14639 degrees of freedom
Multiple R-squared: 0.2735, Adjusted R-squared: 0.2735
F-statistic: 5512 on 1 and 14639 DF, p-value: < 2.2e-16
Python:
OLS Regression Results
Dep. Variable: y R-squared: 0.112
Model: OLS Adj. R-squared: 0.112
Method: Least Squares F-statistic: 1846.
Date: Thu, 25 Mar 2021 Prob (F-statistic): 0.00
Time: 13:08:43 Log-Likelihood: 1.6948e+05
No. Observations: 14641 AIC: -3.390e+05
Df Residuals: 14639 BIC: -3.389e+05
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0004 3.07e-06 126.136 0.000 0.000 0.000
x1 0.1478 0.003 42.969 0.000 0.141 0.155
Omnibus: 3251.130 Durbin-Watson: 0.004
Prob(Omnibus): 0.000 Jarque-Bera (JB): 14606.605
Skew: 1.019 Prob(JB): 0.00
Kurtosis: 7.449 Cond. No. 1.83e+05
I also tried to swap arguments in OLS function. Still getting incorrect results. Could this be related to NAs?