1

I've been trying to use Puppeteer to download PDF files from a specific website but how do I get it to download all the files for example:

A file on the website is like example.com/Contents/xxx-1.pdf A second file on the website is like example.com/Contents/xxx-2.pdf

How can I use puppeteer to download the file contents automatically by trying for each number added?

Vuk
  • 37
  • 1
  • 7
  • Does this answer your question? [How to download file with puppeteer using headless: true?](https://stackoverflow.com/questions/49245080/how-to-download-file-with-puppeteer-using-headless-true) – Aidan Dec 12 '21 at 09:48
  • I did try checking that but unfortunately not, the website I require documents has PDF files in a specific folder examples /Contents/Thesis1.pdf and /Contents/Thesis2.pdf I'm trying to use Puppeteer to automatically add the sequential number and download the PDF to my computer, do you know a solution for that? – Vuk Dec 12 '21 at 10:00
  • is there any kind of DRM that is preventing you from just using http/https libraries? – Aidan Dec 12 '21 at 10:01

1 Answers1

0

I've made a function that given a function with an index as parameter, returns the url of the pdf to download and a count that limits the downloads, it tries to download the pdf.

const puppeteer = require('puppeteer');


downloadFiles((i) => `example.com/Contents/xxx-${i}.pdf`, 20);

async function downloadFiles(url, count) {
    const browser = await puppeteer.launch({
        headless: false,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
    const page = await browser.newPage();
    for (let i = 0; i < count; i++) {
        const pageUrl = await url(i);
        try {
            await page.goto(pageUrl);
            await page.pdf({
                path: `pdf-${i}.pdf`,
                format: 'A4',
                printBackground: true
            });
        } catch (e) {
            console.log(`Error loading ${pageUrl}`);
        }
    }
    await browser.close();
}