0

I'm trying to download and rename files(around 60 per page) using selenium and hit a hard bump.

Here is what I have tried:

1.try to use the solution offered by supputuri, go through the chrome://downloads download manager, I used the code provided but encountered 2 issues: the opened tab does not close properly(which I can fix), most importantly, the helper function provided keeps returning 'None' as file name despite the fact that I can find the files downloaded in my download directory. This approach can work but prolly need some modification in the chrome console command part which I have no knowledge with.

# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
    driver.execute_script("window.open()")
    # switch to new tab
    driver.switch_to.window(driver.window_handles[-1])
    # navigate to chrome downloads
    driver.get('chrome://downloads')
    # define the endTime
    endTime = time.time()+waitTime
    while True:
        try:
            # get downloaded percentage
            downloadPercentage = driver.execute_script(
                "return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
            # check if downloadPercentage is 100 (otherwise the script will keep waiting)
            if downloadPercentage == 100:
                # return the file name once the download is completed
                return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content  #file-link').text")
        except:
            pass
        time.sleep(1)
        if time.time() > endTime:
            break

Selenium give file name when downloading

  1. the second approach I looked at is offered by Red from the post below. I figured since I'm downloading 1 file at a time, I could always find the most recent file and then change the file name after download completes and repeat this process. For this approach I have the following issue: once I grabbed the file object, I cannot seem to find a way to get the file name, I checked the python methods for file object and it doesn't have one that returns the name of the file.
import os
import time

def latest_download_file(num_file,path):
    os.chdir(path)
    while True:
        files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
        #wait for file to be finish download
        if len(files) < num_file:
            time.sleep(1)
            print('waiting for download to be initiated')
        else:
            newest = files[-1]
            if ".crdownload" in newest:
                time.sleep(1)
                print('waiting for download to complete')
            else:
                return newest

python selenium, find out when a download has completed?

Let me know if you have any suggestions. Thanks.

M. Albert
  • 35
  • 6
  • Any reason you aren't using requests to do the actual downloading? – goalie1998 Feb 04 '21 at 06:07
  • @goalie1998 I didnt post the the download code since it's only a couple lines and does what I want it to do. People are saying that selenium does not have control over the name of downloaded file and it has to be done in OS level. – M. Albert Feb 04 '21 at 15:29
  • can you share the used URL? – Alin Stelian Feb 04 '21 at 15:57
  • [link](https://www.eslcafe.com/resumes) and there's a download button for each candidate(requires login). – M. Albert Feb 04 '21 at 16:11
  • there may be timing issues checking the directory. Best way is to grab the response header coming back from the server. This would include the filename. You can then check specifically for that file to appear (with .crdownload at first...) So maybe fire an ajax request for the file... you can then get the response header or xhr object. (jqxhr in jQuery) – pcalkins Feb 04 '21 at 19:14
  • I bumped into that problem, just edited the question to my modified helper function for monitoring file change, I added a field for expected number of files and keep halting until the number of files matches the expected number of files. A potential issue would be if anything funny happens, this might run into an inifinite loop and thus I added some print statement so at least the infinite loop can be spotted. Let me know if there are more elegant solutions. – M. Albert Feb 04 '21 at 19:24

1 Answers1

0

The second approach worked, just download, monitor the directory and use os.rename after download finishes.

M. Albert
  • 35
  • 6
  • To rename the latest downloaded file, according to @farhanjatt's answer here, https://stackoverflow.com/questions/68565656/how-can-i-rename-the-latest-downloaded-file. – Mark K Aug 30 '23 at 03:17