6

I'm downloading a ton of files using puppeteer, but I need to know each file's name before or after download is complete. Watching the folder for file change doesn't solve my problem, due to lots of processes downloading files at the same time and having now way to match them.

I've been trying to set a custom path for download for each file, but Puppeteer does something weird that some downloads go to that folder and others to /Downloads.

So, I would like to know if there's a way to know the name before download or to set the name of the file before downloading. This way I can properly match it through code.

Note: files are downloaded via JS i.e. when a button is clicked. No way to know file name via scraping due to it being auto-generated.

  • you probably need to know the file (donwload) url, the url may contain the file name, also ,if you could, write a node script directly request using that url, in this way you could know when the download finished, and able to write callback after the download finished – plat123456789 Jun 12 '19 at 02:27
  • the node package request [](https://github.com/request/request) – plat123456789 Jun 12 '19 at 02:29
  • also, you need to add more info into your question, for example the donwload button behaviour, to let other people able to answer your question, maybe include some code in it, for detail : https://stackoverflow.com/help/how-to-ask – plat123456789 Jun 12 '19 at 02:56

2 Answers2

6

If the download is triggered by the page, this is done by using the Content-Disposition header. Very likely, the header also includes the file name as part of the header.

Example

Below, an example for the header:

Content-Disposition: attachment; filename="name_of_download.ext"

In order to read the filename, you can therefore check out the name of the file by looking at response.headers(). In the following example I'm using a regular expression after that to extract the file name:

const contentDisposition = response.headers()['content-disposition'];
const matchFilename = contentDisposition.match(/filename="(.*)"/);
if (matchFilename) {
  const filename = matchFilename[1];
}

Non-ASCII characters

Depending on the files you are downloading, you also might want to check out this stackoverflow answer regarding the encoding for non-ASCII file names.

Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
0

You can create a directory, use fsPromises.readdir from Node.js's File system library to get the contents of the directory, then change page's download behavior to redirect the download to this directory, then use fsPromises.readdir to get the new contents, and compare the new and the old.

ad347
  • 1