Aggregating multiple excel files in python - list indexing error with pd.read_excel

Question

I'm trying to aggregate multiple files:

import glob
import pandas as pd
import time

list_of_files = glob.glob('../data/*.xlsx')

frame = pd.DataFrame()
list_ = []

for file_ in list_of_files:
    df = pd.read_excel(file_, ignore_index=True)
    list_.append(df)

frame = pd.concat(list_)

but I've given the following error:

screenshot-error-output

My list of files seems ok:

['../data/Chat Transcript Report 1.xlsx',
 '../data/Chat Transcript Report2.xlsx',
 '../data/Chat Transcript Report3.xlsx',
 '../data/Chat Transcript Report.xlsx']

But the error seems to traceback to the read_excel line, but I'm not sure what the problem is with the indexing. Any advice would be much appreciated!

Can you [edit] the full traceback of your error into your question? — asongtoruin, Oct 17 '18 at 10:20
Error text is too long for me to add to the post (apparently too much code in the post) so i've added as an image. Thanks — LNKirkham, Oct 17 '18 at 11:03

score 0 · Accepted Answer · answered Oct 17 '18 at 11:17

There's nothing wrong with Pandas indexing. There is something wrong, or incompatible, with your Excel file. This has come up at least a couple of times (see 1, 2). The issue may relate to ragged rows (see xlrd issues:ragged_rows) or some other problem with the Excel file.

I suggest you first identify the problem worksheet:

for file_ in list_of_files:
    try:
        df = pd.read_excel(file_, ignore_index=True)
        list_.append(df)
    except IndexError:
        print(file_)

Then open up the printed file_, copy its contents to another workbook, and try again with the new workbook. If it works, then this is evidence there is an aspect of the first workbook which is incompatible with xlrd, which is used by Pandas to read in data from Excel.

Thank you, that helped a lot! That highlighted one of the files which was a problem. Removed that from my list and the rest aggregated without any problems. Cheers! — LNKirkham, Oct 17 '18 at 11:27

Aggregating multiple excel files in python - list indexing error with pd.read_excel

1 Answers1