I'm trying to read a Excel file (.xlsx) and I'm getting the error "IndexError: list index out of range". My code is simple:
import pandas as pd
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')
The error:
File "<ipython-input-16-fd0112985376>", line 2, in <module>
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 364, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1233, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 522, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 420, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 533, in load_workbook
return load_workbook(
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 94, in __init__
self.named_styles = self._merge_named_styles()
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 114, in _merge_named_styles
self._expand_named_style(style)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 124, in _expand_named_style
xf = self.cellStyleXfs[named_style.xfId]
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\cell_style.py", line 185, in __getitem__
return self.xf[idx]
IndexError: list index out of range
Maybe the Excel file is not proberly formatted, as stated here and here. But when I manually save the Excel file, the file is a "Excel Workbook (*.xlsx)" not a "Strict XML Open Spreadsheet", like in those other questions:
Print Screen: saving the Excel file manually
I downloaded this file from the web, so maybe the file is broken, but I don't know how to check it.
Thanks for your attention!
Edit 1:
Here is a print screen from the website's HTML
I don't know HTML, but I found strange that the file id is "xls-link" and its href is "./Html/DIARIO_16-11-2021.xlsx".
Like @Wayne said, when he downloaded the file, it came as .xls. After reading this answer, I tried running
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls')
And got the error "[Errno 2] No such file or directory: 'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls' "
Then I tried to open the file manually and save it as .xls. After running the code above, it actually worked!
But now my problem is: I will have to manually open and save as .xls all +5000 daily files that I need, which is a tedious quest. Does anyone know how I could do this automatically (without actually open it, because I still can't figure it out)?