I'm trying to download and rename files(around 60 per page) using selenium and hit a hard bump.
Here is what I have tried:
1.try to use the solution offered by supputuri, go through the chrome://downloads download manager, I used the code provided but encountered 2 issues: the opened tab does not close properly(which I can fix), most importantly, the helper function provided keeps returning 'None' as file name despite the fact that I can find the files downloaded in my download directory. This approach can work but prolly need some modification in the chrome console command part which I have no knowledge with.
# method to get the downloaded file name
def getDownLoadedFileName(waitTime):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time()+waitTime
while True:
try:
# get downloaded percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# return the file name once the download is completed
return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content #file-link').text")
except:
pass
time.sleep(1)
if time.time() > endTime:
break
Selenium give file name when downloading
- the second approach I looked at is offered by Red from the post below. I figured since I'm downloading 1 file at a time, I could always find the most recent file and then change the file name after download completes and repeat this process. For this approach I have the following issue: once I grabbed the file object, I cannot seem to find a way to get the file name, I checked the python methods for file object and it doesn't have one that returns the name of the file.
import os
import time
def latest_download_file(num_file,path):
os.chdir(path)
while True:
files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
#wait for file to be finish download
if len(files) < num_file:
time.sleep(1)
print('waiting for download to be initiated')
else:
newest = files[-1]
if ".crdownload" in newest:
time.sleep(1)
print('waiting for download to complete')
else:
return newest
python selenium, find out when a download has completed?
Let me know if you have any suggestions. Thanks.