1

I've been trying to read an excel file with read_excel pandas method df_archivo = pd.read_excel(relative_file_path) but it throws me an error.

ValueError: File is not a recognized excel file

def procesar_archivos_bitacora(directorio):
   # Obtener la lista de archivos en el directorio
   lista_archivos = get_files_in_path(directorio)
   folder='input/'
   download_all_files(lista_archivos,folder)
   for archivo in lista_archivos:
      local_path = join(folder,archivo)
      print('joined path: ',local_path)
      formatear_bitacora(local_path)

As you can see, I have my reading method inside a loop. 'joined path' is like I was expecting. And inside input folder there are only excel files.

The steps I'm following are:

  1. I download the file from a sharepoint directory
  2. I join the download folder directory with every downloaded file name
  3. Then I try to read each file using a loop
  • Do these files have the .xls or .xlsx extension or both? – aozk Jul 26 '21 at 19:16
  • @aozk all those files have. xlsx extension. – B. Guerrero Jul 26 '21 at 19:21
  • Can you try pd.read_excel with parameter engine='openpyxl' ? – aozk Jul 26 '21 at 19:28
  • @aozk When I tried it, it showed me this error: Bad Zip FIle. – B. Guerrero Jul 26 '21 at 20:02
  • There must be a problem with files, can you open them in excel without any errors / warnings? – aozk Jul 26 '21 at 20:10
  • @aozk I followed your suggestion: 1) If I download directly the file, it works when opening 2) If I follow the same steps described above on my jupyter notebook, it works too 3) But if I run my .py file with those steps from my vscode editor, it fails. The file got corrupted. – B. Guerrero Jul 26 '21 at 20:15
  • I ran into this issued when opening an *old* XLS file (Excel 2), and pandas could not read it. Ended up using `xlrd` to read the file manually into a `StringIO`, then feeding that into pandas as a CSV. (Perhaps the extension should be `.xls` and has been mislabeled and became misleading as to the true issue?) – S3DEV Aug 25 '21 at 08:33

1 Answers1

0

xlrd is supporting only .xls files. There are two workarounds:

First (recommended by me) is to install openpyxl

pip install openpyxl

Then put openpyxl into engine parameter of pd.read_excel

pd.read_excel(local_path, engine='openpyxl')

Second is to downgrade xlrd version

pip install xlrd==1.2.0
aozk
  • 366
  • 5
  • 17
  • OP's error message suggests that xlrd is not the issue... https://stackoverflow.com/questions/65250207/pandas-cannot-open-an-excel-xlsx-file – BigBen Jul 26 '21 at 19:57