I took an online course where the instructor explained backward elimination using a dataset(50,5) where you eliminate the columns manually by looking at their p-values.
import statsmodels.api as sm
X = np.append(arr = np.ones((2938, 1)).astype(int), values = X, axis = 1)
X_opt = X[:, [0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# Second Step
X_opt = X[:, [0,1,,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
# and so on
Now while practicing on on an large dataset such as (2938, 214) which I have, do I have to eliminate all the columns myself? Because that is a lot of work, or is there some sort of algorithm or way to do it.
This might be a stupid question but I am a begineer in machine learning so any help is appreciated.Thanks