0

I have a script that automatically downloads excel files from a website using selenium.

What I want to do is create 1 big master file. I located the file by doing

  list_of_files = glob.glob(r"C:\Users\Raymond.van.Zonnevel\*********\*")
  latest_file = max(list_of_files, key=os.path.getctime)

Then I want to open the file. But this results in an error

Temp_df = pd.read_excel(str(latest_file))
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<html xm'

I think this has something to do with the fact that I download the files using selenium.

What I ultimately want to do is:

  1. download the file --> Done
  2. locate the file --> Done
  1. open the file --> this is where I get my error
  2. take the 3rd row and paste in a master file
  3. delete the old file and repeat for all next downloads (in for loop)

How would I go about opening and using the downloaded files?

  • If you do `print(latest_file)`, what does it show? Also, you may want to look [here](https://stackoverflow.com/questions/9623029/python-xlrd-unsupported-format-or-corrupt-file) – Partha Mandal Jul 27 '20 at 18:55
  • That was Just as a check to see whether the correct path was followed (it was). And the issue i think Lies in the fact that is was download via a driver, so not the fact that it might not be An xlsx (it is An xlsx file). When opening the file i het the error that the extention is different than when it was downloaded and if the file should be openend or not. – Raymond van zonneveld Jul 28 '20 at 06:34

0 Answers0