BeautifulSoup and Selenium: download files from page

Question

I am using Selenium to navigate to the URL, as beautifulsoup with html.parser does not output all content. I have gathered a list of files on the page and stored them in an array. Next, I wanted to go through each and write all to a CSV, but the file path uses /file-browser-api/download/ format.

Is there a way I can open up each of these files and then write to a CSV? I could technically use Selenium to click each one, but the package does not support iteration.

url = "https://marketplace.spp.org/pages/rtbm-lmp-by-location#%2F2021%2F02%2FRePrice"
driver.get(url)
driver.maximize_window()

html = driver.page_source
soup = BeautifulSoup(html, "lxml")

all_files = []
files = soup.find_all(class_="files")
for file in files:
    tests = file.get('href')
    if tests != None:
        all_files.append(tests)

#this was a test to see if I could download the first file
with open(all_files[0], 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(headers)
    writer.writerows(row for row in rows if row)

[Errno 2] No such file or directory: '/file-browser-api/download/rtbm-lmp-by-location?path=%2F2021%2F02%2FRePrice%2FRTBM-LMP-SL-202102051005-R1-RC4.csv'

Perhaps you want to use `requests` library see https://stackoverflow.com/questions/13137817/how-to-download-image-using-requests. — Galunid, Feb 22 '21 at 15:04
I haven't tried using requests to call the links, would this actually let me download the files in mentioned format? — Zachary Wyman, Feb 22 '21 at 15:06
The provided link shows how to download files using requests. While you'd have to modify it to suit your needs yourself, yes, it will allow you to do just that — Galunid, Feb 22 '21 at 15:21

BeautifulSoup and Selenium: download files from page

0 Answers0