How to fix Python/Pandas too many indexers error?

Question

I have tried the run the code below in Python 3.7 to loop through every combination of data columns in the dataframe 'Rawdata' to create a subset of regression models using statsmodel library and returns the best one. The code does not throw up any errors until I run the last line: best_subset(X, Y). It returns : "IndexingError: Too many indexers".

Any idea what's wrong/how to fix?

Would be great if someone can help! Thanks

#Data
Rawdata = pd.read_csv(r'C:\Users\Lucas\Documents\sample.csv')

#Main code
def best_subset(X, Y):
    n_features = X.shape[1]
    subsets = chain.from_iterable(combinations(range(n_features), k+1) for k in range(n_features))
    best_score = -np.inf
    best_subset = None
    for subset in subsets:
        lin_reg = sm.OLS(Y, X.iloc[:, subset]).fit()
        score = lin_reg.rsquared_adj
        if score > best_score:
            best_score, best_subset = score, subset
    return best_subset, best_score

#Define data inputs and call code above
X = Rawdata.iloc[:, 1:10]
Y = Rawdata.iloc[:, 0]

#To return best model
best_subset(X, Y)

score 0 · Accepted Answer · answered Mar 26 '20 at 08:20

0

Your looping variable subset can be a tuple of length n_features. If, for example, the subset is (0, 1), your regression reads as

lin_reg = sm.OLS(Y, X.iloc[:, (0, 1)]).fit()

Pandas does not know how to handle this (see here). One solution is to convert the type of subset from tuple to a list:

for subset in subsets:
    subset = list(subset)
    lin_reg = sm.OLS(Y, X.iloc[:, subset]).fit()

answered Mar 26 '20 at 08:20

above_c_level

3,579
3
22
37

Then please [accept the answer](https://stackoverflow.com/help/someone-answers) – above_c_level Mar 27 '20 at 07:16

How to fix Python/Pandas too many indexers error?

1 Answers1