1

I have one excel file with many sheets. There is only one column in every sheet, which is column A. I plan to read the excel file with read_excel() method. Hier is the code:

import pandas as PD

ExcelFile  = "C:\\AAA.xlsx"
SheetNames = ['0', '1', 'S', 'B', 'U'] 
# There are five sheets in this excel file. Those are the sheet names.

PageTotal  = len(SheetNames)

for Page in range(PageTotal):
    df = PD.read_excel(ExcelFile, header=None, squeeze = True, parse_cols = "A" ,sheetname=str(SheetNames[Page]))
    print df
    #do something with df

The problem is, the for loop runs only once. By running the second item in the for loop it shows me the following error text:

  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in __init__
    self.book = xlrd.open_workbook(io)
  File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 824, in open_workbook_2007_xml
    x12sst.process_stream(zflo, 'SST')
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 432, in process_stream_iterparse
    for event, elem in ET.iterparse(stream):
  File "<string>", line 103, in next
IndexError: pop from empty stack

As a beginner I have no idea about this error. Could anybody please help me to correct the codes? Thanks.

UPDATE Question:

If it is because that the excel file contains many formulars and external links, why the for loop could still run its first item? Confused.

Community
  • 1
  • 1
Rita
  • 2,117
  • 3
  • 15
  • 15
  • i think you should upload somewhere a sample excel file, which can help to reproduce this error and post here a link to this file. PS i couldn't reproduce your error using my sample excel file - everything was working as it should – MaxU - stand with Ukraine Aug 05 '16 at 04:22
  • @MaxU Thanks for the hint. I just tested, the code works well with a simple and small excel file. But my excel file is very large and complex, it contains many formels and many extern links. I am not sure if it works if I just upload one excel file...Anyway, the excel file should be the reason. Thanks! – Rita Aug 05 '16 at 10:05
  • It could be formulas and external links, causing these problems. So you can try to save/export your file as CSV file and read it in pandas – MaxU - stand with Ukraine Aug 05 '16 at 10:16
  • @MaxU CSV only supports one file with one sheet, right? I just have too many sheets. But you are right, I have to work with those external links. – Rita Aug 05 '16 at 10:45
  • @MaxU I believe I deleted all the formulas and external links. But it can still only run the first item in the loop. Now I am really confused. It shows the same error message. – Rita Aug 05 '16 at 11:54
  • i don't think the SO community can help you not being able to reproduce this error... – MaxU - stand with Ukraine Aug 05 '16 at 13:17
  • You can use dichotomy to locate the culplit: cut out half of the data and see if the problem goes away. If it doesn't cut half of the remaining and so on; if it does, add back half of the removed (or undo the remove and cut less this time). – ivan_pozdeev Aug 06 '16 at 11:11
  • In the meantime, I'm voting to close the question as omitting critical information. – ivan_pozdeev Aug 06 '16 at 11:12

3 Answers3

1

Why are you using sheetname=str(SheetNames[Page])?

If I understand your question properly I think what you want is:

import pandas as PD

excel_file  = r"C:\\AAA.xlsx"
sheet_names = ['0', '1', 'S', 'B', 'U'] 

for sheet_name in sheet_names:
    df = pd.read_excel(excel_file, header=None, squeeze=True, parse_cols="A", sheetname=sheet_name)
    print(df)
    #do something with df 
Batman
  • 8,571
  • 7
  • 41
  • 80
  • These are just two different writing methods of loops, right? I tested, the error message is the same. – Rita Aug 05 '16 at 09:16
0

Referring to the answer here: Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Perhaps you can try this:

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = []
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs.append(xls.parse(x))

Or this as a dict instead of list so you can easily get a particular sheet out to work with

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = {}
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs[x] = xls.parse(x)
Community
  • 1
  • 1
Porz
  • 183
  • 3
  • 15
  • Thanks. But I am not going to combine all the DataFrames. I have to work with every sheet seperately. – Rita Aug 05 '16 at 03:20
  • The method you mentioned in the link doesnot work for me. :-( – Rita Aug 05 '16 at 03:25
  • What is the error you are getting when you say "it does not work for me"? – Porz Aug 05 '16 at 03:31
  • It can also only run the first item in the loop and shows exactly the same error text I posted above. – Rita Aug 05 '16 at 03:45
  • Make sure all the sheets have non-empty first row (the sheets should have header/data anyway to be useful...) – Porz Aug 05 '16 at 03:52
  • I double checked. All the sheets in the excel file have filled first rows. – Rita Aug 05 '16 at 09:15
0

You can simply use:

df = pd.read_excel("C:\\AAA.xlsx", sheet_name=None)  
for key, value in df.items(): 
    ................

When you set 'sheet_name=None', pandas will automatically read all excel sheets from your workbook. And for iterating over sheets and it's content you can iterate over 'df.items()' and do whatever manipulation you'll have to do. In this above code 'key' is the sheets name and 'value' is the content inside sheet. There is no need to create extra list object, in your case 'sheet_names'. Hope it will solve your issue.

Ashu007
  • 745
  • 1
  • 9
  • 13