I am currently working on a project where I need to extract many files from a database, for which there is no API. I need to do it through a webpage by constructing URL's similar to this one:
https://bmsnet.cas.dtu.dk/Trendlogs/ExportCSV_TrendlogRecordData/1
The integer at the end of the URL (in the example above: 1), will be ranging from 1 to 35000. When constructing the URL, I get a pop-up windows for saving the file such as:
Pop-up window for file download
My question is how do I automate that process using python. I am capable of generating these URLs and handle the data resulting from the file download (so far when doing this manually). The step I am stuck at, is for constructing a python command/bit of code that allows me to click on the save as button. Eventually I want to end up with a code doing the following:
- Construct the URL
- Save the file arising from the pop-up window
- Load/read and process the data
EDIT :
I have now found a solution using Selenium.
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pyautogui
import time
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
dl_path = "MY_LOCAL_DOWNLOAD_PATH"
profile = FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", dl_path)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk",
"text/plain,text/x-csv,text/csv,application/vnd.ms-excel,application/csv,application/x-csv,text/csv,text/comma-separated-values,text/x-comma-separated-values,text/tab-separated-values,application/pdf")
driver = webdriver.Firefox(firefox_profile=profile)
URL = "https://bmsnet.cas.dtu.dk"
driver.get(URL)
# Let the page load
time.sleep(5)
username = driver.find_element_by_id("Email")
password = driver.find_element_by_id("Password")
username.send_keys("my_username")
password.send_keys("my_password")
elem = driver.find_element_by_xpath("/html/body/div[2]/div/div[1]/section/form/div[4]/div/input")
elem.click()
time.sleep(5)
start = 1
stop = 10
for file_integer in range(start, stop):
URL = "https://bmsnet.cas.dtu.dk/Trendlogs/ExportCSV_TrendlogRecordData/{0}".format(file_integer)
driver.get(URL)
time.sleep(5)
print('Done downloading integer: {0}'.format(file_integer))
The above code works but only once. For some reason the for loop gets stuck after the first iteration. Any clue on what I am doing wrong there?
Thank you for your time and help. Looking forward to hearing your ideas on that.