Downloading PDF's and tracking downloads with Python

Question

I'm creating an application that downloads PDF's from a website and saves them to disk. I understand the Requests module is capable of this but is not capable of handling the logic behind the download (File size, progress, time remaining etc.).

I've created the program using selenium thus far and would like to eventually incorporate this into a GUI Tkinter app eventually.

What would be the best way to handle the downloading, tracking and eventually creating a progress bar?

This is my code so far:

from selenium import webdriver
from time import sleep 
import requests

import secrets

class manual_grabber():
    """ A class creating a manual downloader for the Roger Technology website """
    def __init__(self):
    """ Initialize attributes of manual grabber """
    self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')

def login(self):
    """ Function controlling the login logic """
    self.driver.get('https://rogertechnology.it/en/b2b')

    sleep(1)

    # Locate elements and enter login details
    user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
    user_in.send_keys(secrets.username)   

    pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
    pass_in.send_keys(secrets.password)

    enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
    enter_button.click()
    
    # Click Self Service Area button
    self_service_button = self.driver.find_element_by_xpath('//*[@id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
    self_service_button.click()

def download_file(self):
    """Access file tree and navigate to PDF's and download"""
    # Wait for all elements to load 
    sleep(3)

    # Find and switch to iFrame
    frame = self.driver.find_element_by_xpath('//*[@id="siteOutFrame"]/iframe')
    self.driver.switch_to.frame(frame)

    # Find and click tech manuals button 
    tech_manuals_button = self.driver.find_element_by_xpath('//*[@id="fileTree_1"]/ul/li/ul/li[6]/a')
    tech_manuals_button.click()


bot = manual_grabber()
bot.login()
bot.download_file()

So in summary, I'd like to make this code download PDF's on a website, store them in a specific directory (named after it's parent folder in the JQuery File Tree) and keep tracking of the progress (file size, time remaining etc.)

Here is the DOM:

I hope this is enough information. Any more required please let me know.

I'm sure you can use chrome downloads tab to see the progress and file information. You can refer to this post https://stackoverflow.com/questions/34548041/selenium-give-file-name-when-downloading/56570364#56570364 that will wait until the file download is completed. You can leverage the logic that keeps checking the % of dowload (progress bar) — supputuri, Aug 26 '20 at 14:00
@supputuri The only problem I envisage is at certain points of the download, I need to change the save directory and file names etc. — j4yman, Aug 26 '20 at 14:17

Downloading PDF's and tracking downloads with Python

0 Answers0