I am performing a regression model using:
X = dataset2.iloc[:, 0:-1]
y = dataset2.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Then, I extract the coefficients with:
coefficients = regressor.coef_
However, as I need the standard error (variance-covariance matrix) I am performing this manually by doing:
features = dataset2.iloc[:, 0:-1]
# N = number of observations
# k = number of independent regressors
N = len(X_train)
k = len(features.columns) + 1 # plus one because LinearRegression adds an intercept term
X_with_intercept = np.empty(shape=(N, k), dtype=float)
X_with_intercept[:, 0] = 1
X_with_intercept[:, 1:k] = X_train
# b = (X'X)^-1 X'y
# @ is the matrix multiplication operator
beta_hat = np.linalg.inv(X_with_intercept.T @ X_with_intercept) @ X_with_intercept.T @ y_train
print(beta_hat)
which returns:
[ 0. 0. -0.01 -0. 0. 0. -0. -0. -0. -0. -0. ]
On the other hand, coefficients
return:
[0.0021308430119209416, -0.006294407027962639, -0.0021887043694901707, 0.004512777544097981, 0.000550417874231508, -0.0003297844194107745, -0.0019042607512515818, -0.0011443799090231155, -0.0012652793840597606, -0.0017634228809034023]
I'd like to increase the number of decimal places so I can compare the two methods properly. I tried using round(beta_hat, 6), but it didn't do the trick...
Source code of manual computation: Python scikit learn Linear Model Parameter Standard Error