1

Basically I have a Flask server that executes some commands using Selenium when it receives a certain POST request. Among them, downloading files in the browser. The problem is that I can't capture the name of the downloaded file to control that file. There are other ways to capture the file, like getting the last modified file in the folder. But imagine that, being a server, a high amount of requests would make this terrible.

Is there any way to control these downloaded files? Remembering that each request generates one and only one downloaded file, which belongs to that user's request

  • You'd want to keep a list of all the files in the directory. Make sure to wait for each file to appear after triggering a download (will probably be ".crdownload" first...) before starting the next so that you can know what that file's name is. If there are separate threads/drivers running, you'd want each to have a unique folder. (You can also capture the file's name from the server's response, but again you need to wait for that to happen... the front-end usually does not know when the download has begun or ended.) Front-end detections would depend on the site... – pcalkins Aug 15 '23 at 21:14
  • so for each thread/request i would create a "download folder"? that makes sense – Diego Cândido Aug 15 '23 at 21:21
  • not for each request, but for each driver/browser pair if you are running multiple drivers at the same time. You'd set that browser's download folder to be unique to the thread. In each thread, though, you have to first trigger the download (click or whatever), then wait until a new file appeared in the directory, update your file list... then you can run the rest of your selenium script... (also remember you'll need to check for that file to change from a partial download to a full download, but that you can do in it's own thread...) – pcalkins Aug 15 '23 at 21:24
  • thats the problem. in each request i have a driver/browser pair. basically the front-end sends an URL to the server and the server downloads the file of the URL. – Diego Cândido Aug 15 '23 at 21:26
  • your answer really makes sense to me. but i dont know how i could avoid one folder per request – Diego Cândido Aug 15 '23 at 21:28
  • not sure we're using the term "request" in the same context. What request are you talking about? – pcalkins Aug 15 '23 at 21:30
  • I'm talking about the POST request that the front-end makes to the server. when user makes POST request, server opens the browser using selenium and downloads some file – Diego Cândido Aug 15 '23 at 21:31
  • not sure you'd want the overhead that launching all those browsers would bring. (browsers are resource hogs...) I'd look for another way to go about this... it'll actually be much easier to just have your server make the requests for the downloads directly. – pcalkins Aug 15 '23 at 21:38

1 Answers1

1

If your files were downloaded via Selenium in a Chrome browser, then they will appear in a list at this URL in the browser: chrome://downloads/

If you plan on reading that page via Selenium, then note that the URLs will be contained within ShadowDOM / Shadow root elements. There are existing Stack Overflow posts on handling that: https://stackoverflow.com/search?q=shadow+chrome+downloads+python

If you want to change the default download location using Python Selenium, see: Downloading a file at a specified location through python and selenium using Chrome driver

Michael Mintz
  • 9,007
  • 6
  • 31
  • 48