I am reading excel files (.xls) from S3 with Pandas. The code works properly for a few files, but for the rest not. The files are received daily with different values per day (excel files structure is the same, so we can consider the files identical).
The error is:
ValueError: Excel file format cannot be determined, you must specify an engine manually.
at this line:
pd.read_excel(io.BytesIO(excel), sheet_name=sheet, index_col=None, header=[0])
I have tried all the solutions mentioned on internet: specifying the engine='openpyxl'
gives the error:
zipfile.BadZipFile: File is not a zip file
and specifying the engine='xlrd'
gives the error:
expected str, bytes or os.PathLike object, not NoneType
I am using boto3 to connect to S3 resource. Once again, for a few files my code works fine. What can be the cause of this different behaviour for excel files that looks identical?
My question is very similar with Excel file format cannot be determined with Pandas, randomly happening but it doesn't have a proper response yet.