I have a dataframe with 5 columns. I delete one column and define this as y
, the dependent variable. I will use the other 4 columns x1, x2, x3, x4 or matrix X
to predict this variable with some sort of regression modeling.
For example:
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit(X,y)
clf.coef_
will have the regression coefficients, i.e. clf.coef_
gives me the coefficient for each variable. My question is how do I find the P value for each variable?
Also, in comparing multiple sklearn
models (here skearn.linear_model
), how do I plot these to see the effects of linear regression vs lasso vs. ridge regression, etc.?