0

In google colab, I received an xls file and was trying to covert it pandas dataframe as follows.

res=requests.get(url, cookies=cookies)
df=pd.read_excel(res.content)

However, it gives me an error message as follows

Excel file format cannot be determined, you must specify an engine manually.

I tried openpyxl

df=pd.read_excel(bb, engine='openpyxl')

but it gave me another error

File is not a zip file

I have tried 'xlrd', but it also gave me an error message.

Someone insisted that it might be a matter of a hidden xls file and it could possibly be solved by simply removing the hidden file, here, but I could not find any hidden file in google colab directory by using this command

!ls -al

What's strange is that these errors occur randomly. It often works well. Sometimes it works one time out of 20 times and sometimes it does not work one time out of 20 times.

Can anyone help me?

JHP
  • 15
  • 5
  • Have you tried what the answers to this [question](https://stackoverflow.com/questions/33873423/xlsx-and-xlsm-files-return-badzipfile-file-is-not-a-zip-file) suggest? – Tedpac Feb 28 '22 at 09:41
  • If after trying nothing works, is it possible for you to share the excel file in order to reproduce the error, you could change the names of the columns and sheets if necessary. – Tedpac Feb 28 '22 at 09:42
  • Thank you for your comment Tedpac. I don't think my problem has anything related to 'openpyxl' , because my response file should be an 'xls' file not 'xlsx'. – JHP Feb 28 '22 at 09:47
  • It's true, I thought "openpyxl" worked with "xls" files. Could you tell what error you got when trying to use "xlrd" as the engine? – Tedpac Feb 28 '22 at 09:57
  • Here[link](https://docs.google.com/spreadsheets/d/1bvdXteh4Q-0hoXmceNTnMPC8UKsRK_pW/edit#gid=1775582721) is the xls file. But please remember that I have to receive it through python_requests – JHP Feb 28 '22 at 09:58
  • The error message when using 'xlrd' is `Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n\n\n'` – JHP Feb 28 '22 at 10:01
  • Using `df = pd.read_excel("/content/drive/MyDrive/file.xls", engine="xlrd")` worked perfectly on my Google Colab. I'm using `pandas 1.3.5` and `xlrd 2.0.1`. Instead of using `requests` you could directly use `df = pd.read_excel(url, engine="xlrd")`. – Tedpac Feb 28 '22 at 10:20
  • It doesn't work. As `pd.read_excel` has no place for cookies, I have to use `res=requests.get(url, cookies=cookies)`. I should be logged in the website and get cookies for the access of that xls file. I still dont know what is the problem. What is frustrating is that sometimes it works well. – JHP Feb 28 '22 at 10:33
  • Honestly, I can't think of what the problem is, and I don't have a way to reproduce it either (because in my Google Colab it worked fine), additionally, it's something that hasn't happened to me. I hope someone can help you. – Tedpac Feb 28 '22 at 10:37
  • 1
    Thank you Tedpac. I appreciate your comment. – JHP Feb 28 '22 at 10:39

0 Answers0