2
print('Hello world!')

I am attempting to download a file from Chrome using selenium. Every time I download a file, it starts with 'TCBT', but the remaining file name will be different every time. I only want to download that one file.

An interesting thing occurs when I hit the last click to download the file. The website is slower and the file is larger so once I click download an indeterminable amount of time passes (20-60 seconds). Then a popup window occurs which then starts downloading the file which takes another 10-30 seconds.

I've have tried several different infinite loops to identify when the file finishes downloading, but it always gets stuck in the loop or skips it. I believe that it has something to do with the file name as I am only able to use a glob to find the one I want. See some examples:

One try:

while True:
download_folder = os.path.expanduser('~')+'/Downloads/'
filenames = glob.glob(download_folder+'TCBT*')
for name in filenames:
    if name.endswith('.crdownload'):
        time.sleep(1)
    if name.endswith('.xls'):
        break
    else:
        continue

This keeps looping no matter what. I am attempting to use the "continue" portion to go back and get the filename because for the first 20-60 seconds (waiting for the popup to begin downloading) there will be no file.

I have also tried using a function I found online:

def download_wait(directory, timeout, nfiles=None):
    """
    Wait for downloads to finish with a specified timeout.

    Args
    ----
    directory : str
        The path to the folder where the files will be downloaded.
    timeout : int
        How many seconds to wait until timing out.
    nfiles : int, defaults to None
        If provided, also wait for the expected number of files.

    """
    seconds = 0
    dl_wait = True
    while dl_wait and seconds < timeout:
        time.sleep(1)
        dl_wait = False
        files = os.listdir(directory)
        if nfiles and len(files) != nfiles:
            dl_wait = True

        for fname in files:
            if fname.endswith('.crdownload'):
                dl_wait = True

        seconds += 1
    return seconds

download_wait(download_folder, 30)

When I use this, nothing happens and my script finishes. I'm assuming it is checking the folder, not seeing any file (because it takes 20-60 seconds to begin downloading) and completing.

Any thoughts on how to solve this?

MontyP
  • 61
  • 9
  • 1
    Are you able to get the href or link for the download button you are clicking? It could be possible to use selenium to get the link for the button, and then use `requests` to download the file kind of like in [this answer](https://stackoverflow.com/questions/15644964/python-progress-bar-and-downloads). Obviously you don't need the progress bar – C.Nivs Aug 28 '20 at 17:05
  • Thanks for the suggestion. However, I only have button value, onclick, class, type, and ID elements. Also, this doesn't launch to a downloadable file. The URL that launches is a (about:blank) popup that shows at the bottom the excel file being downloaded to chrome. – MontyP Aug 28 '20 at 19:09

2 Answers2

0

If your workaround didn't worked, you can create a new method to cover the download logic:

  1. Check if the file is present (based on name)
  2. Inside a loop take file's size and compare after 5 (or how many you want) seconds again with its size until you have a match so you can break the loop and consider the download done.
Razvan
  • 347
  • 1
  • 11
  • Thanks for the suggestion, I think my trouble is finding the syntax for the correct solution. – MontyP Aug 31 '20 at 13:58
0

Here is how I solved it. I needed to add a second break to break out of the second loop. Added a few more to the tree:

# Wait for download
while True:
    download_folder = os.path.expanduser('~')+'/Downloads/'
    filenames = glob.glob(download_folder+'TCBT*')
    if len(filenames) > 0 and not any('.crdownload' in name for name in filenames):
        break
    for name in filenames:
        if name.endswith('.crdownload'):
            continue
        if name.endswith('.xls'):
            print('')
            print('Download complete.')
            print('')
            break
        else:
            break
MontyP
  • 61
  • 9