I have tried the run the code below in Python 3.7 to loop through every combination of data columns in the dataframe 'Rawdata' to create a subset of regression models using statsmodel library and returns the best one. The code does not throw up any errors until I run the last line: best_subset(X, Y). It returns : "IndexingError: Too many indexers".
Any idea what's wrong/how to fix?
Would be great if someone can help! Thanks
#Data
Rawdata = pd.read_csv(r'C:\Users\Lucas\Documents\sample.csv')
#Main code
def best_subset(X, Y):
n_features = X.shape[1]
subsets = chain.from_iterable(combinations(range(n_features), k+1) for k in range(n_features))
best_score = -np.inf
best_subset = None
for subset in subsets:
lin_reg = sm.OLS(Y, X.iloc[:, subset]).fit()
score = lin_reg.rsquared_adj
if score > best_score:
best_score, best_subset = score, subset
return best_subset, best_score
#Define data inputs and call code above
X = Rawdata.iloc[:, 1:10]
Y = Rawdata.iloc[:, 0]
#To return best model
best_subset(X, Y)