I'm using the StandardScalar() and lin_reg.coef_ function in the following context:
for i in range(100):
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=i)
scaler = StandardScaler().fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
lin_reg = LinearRegression().fit(x_train, y_train)
if i == 0:
print(lin_reg.coef_)
if i == 1:
print(lin_reg.coef_)
This leads to the following output:
Code Output
So, as have been expected, the coef_ function returns the coefficients for the 22 different features I am passing into the linear regression. However, for the second output, some of the coefficients are way too large (e.g. 1.61e+14). I am pretty sure that the scaling with StandardScaler() works as it should be. However, if I do not scale the training data before applying the coef_ function, I do not get these high coefficients. One important thing that I should mention is that the last 13 features are binary features, whereas the first 9 features are continuous (such as age). I can imagine that the problem is somehow related to this fact, although, for the first binary feature, the coefficients are properly computed (just the last 12 binary features have too large coefficients).