0

I'm trying to aggregate multiple files:

import glob
import pandas as pd
import time

list_of_files = glob.glob('../data/*.xlsx')

frame = pd.DataFrame()
list_ = []

for file_ in list_of_files:
    df = pd.read_excel(file_, ignore_index=True)
    list_.append(df)

frame = pd.concat(list_)

but I've given the following error:

screenshot-error-output

My list of files seems ok:

['../data/Chat Transcript Report 1.xlsx',
 '../data/Chat Transcript Report2.xlsx',
 '../data/Chat Transcript Report3.xlsx',
 '../data/Chat Transcript Report.xlsx']

But the error seems to traceback to the read_excel line, but I'm not sure what the problem is with the indexing. Any advice would be much appreciated!

jpp
  • 159,742
  • 34
  • 281
  • 339
LNKirkham
  • 13
  • 4

1 Answers1

0

There's nothing wrong with Pandas indexing. There is something wrong, or incompatible, with your Excel file. This has come up at least a couple of times (see 1, 2). The issue may relate to ragged rows (see xlrd issues:ragged_rows) or some other problem with the Excel file.

I suggest you first identify the problem worksheet:

for file_ in list_of_files:
    try:
        df = pd.read_excel(file_, ignore_index=True)
        list_.append(df)
    except IndexError:
        print(file_)

Then open up the printed file_, copy its contents to another workbook, and try again with the new workbook. If it works, then this is evidence there is an aspect of the first workbook which is incompatible with xlrd, which is used by Pandas to read in data from Excel.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thank you, that helped a lot! That highlighted one of the files which was a problem. Removed that from my list and the rest aggregated without any problems. Cheers! – LNKirkham Oct 17 '18 at 11:27