0

I'm creating an application that downloads PDF's from a website and saves them to disk. I understand the Requests module is capable of this but is not capable of handling the logic behind the download (File size, progress, time remaining etc.).

I've created the program using selenium thus far and would like to eventually incorporate this into a GUI Tkinter app eventually.

What would be the best way to handle the downloading, tracking and eventually creating a progress bar?

This is my code so far:

from selenium import webdriver
from time import sleep 
import requests

import secrets

class manual_grabber():
    """ A class creating a manual downloader for the Roger Technology website """
    def __init__(self):
    """ Initialize attributes of manual grabber """
    self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')

def login(self):
    """ Function controlling the login logic """
    self.driver.get('https://rogertechnology.it/en/b2b')

    sleep(1)

    # Locate elements and enter login details
    user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
    user_in.send_keys(secrets.username)   

    pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
    pass_in.send_keys(secrets.password)

    enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
    enter_button.click()

    # Click Self Service Area button
    self_service_button = self.driver.find_element_by_xpath('//*[@id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
    self_service_button.click()

def download_file(self):
    """Access file tree and navigate to PDF's and download"""
    # Wait for all elements to load 
    sleep(3)

    # Find and switch to iFrame
    frame = self.driver.find_element_by_xpath('//*[@id="siteOutFrame"]/iframe')
    self.driver.switch_to.frame(frame)

    # Find and click tech manuals button 
    tech_manuals_button = self.driver.find_element_by_xpath('//*[@id="fileTree_1"]/ul/li/ul/li[6]/a')
    tech_manuals_button.click()


bot = manual_grabber()
bot.login()
bot.download_file()

So in summary, I'd like to make this code download PDF's on a website, store them in a specific directory (named after it's parent folder in the JQuery File Tree) and keep tracking of the progress (file size, time remaining etc.)

Here is the DOM:

enter image description here

I hope this is enough information. Any more required please let me know.

j4yman
  • 81
  • 7

1 Answers1

0

I would recommend using tqdm and the request module for this. Here is a sample code that effectively achieves that hard job of downloading and updating progress bar.

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data)) #change this to your widget in tkinter
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

The block_size is your file-size and the time-remaining can be calculated with the number of iterations performed per second with respect to the block-size that remains. Here is an alternative - How to measure download speed and progress using requests?

AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35