0

I have created a puppeteer script to scrape some images from a website. I'm storing the images urls inside an array and I want to save each images using fs. How I can grab the image and it's name from the url to pass it to the writeFileSync function?

this is an example of the url for images https://www.examplesite.com/wp-content/uploads/2016/08/cat-500x500.jpg

 let images = [];
  page.on('response', (response) => {
    const url = response.url();
    if( url.startsWith('https://www.examplesite.com/wp-content/uploads/') ){
      images.push(url);
      console.log(images);
    }
  });
  page.goto('https://www.examplesite.com/shop/?product_count=150', {waitUntil: ['load', 'networkidle2']});

  page.waitForNavigation().then( () => {
    page.goto('https://www.examplesite.com/shop/page/2/?product_count=150', {waitUntil: ['load', 'networkidle2']})
    .then( () => {
      images.forEach( (img) => {
// I'm trying to save the image by passing the url to the fs.writeFileSync but without success.
        fs.writeFileSync(www, img);
      });
    });
  
  });
newbiedev
  • 2,607
  • 3
  • 17
  • 65
  • Does this answer your question? [How to download a file with Node.js (without using third-party libraries)?](https://stackoverflow.com/questions/11944932/how-to-download-a-file-with-node-js-without-using-third-party-libraries) – cseitz Mar 08 '21 at 20:09
  • @cseitz not at all, how I can get the last part of the url that contains the filename? – newbiedev Mar 08 '21 at 20:16
  • 1
    You're right. I'll look into how to fetch all the images, but that link shows you how to save them. Your question is a two-parter. – cseitz Mar 08 '21 at 20:18
  • Thank you for the link I didn't know that with `http` module was possible to fetch the resource and then pass it to the `fs`. I'implementing it but I need to solve the problem of how to get the filenames from the url. At the moment I've also noticed that not all the images links are pushed into the array, also if I passy the query param to show 150 elements for each page – newbiedev Mar 08 '21 at 20:31
  • Perhaps [this one](https://stackoverflow.com/questions/46377955/puppeteer-page-evaluate-queryselectorall-return-empty-objects) answers the other half? You can do `querySelectorAll('img')` to get all the images and filter the ones you want to download. – cseitz Mar 08 '21 at 20:34
  • I was relying on the responses since I've implemented the request interceptor of puppeteer. I think the problem is that woocommerce is loading images using lazy load but not sure of this – newbiedev Mar 08 '21 at 20:45

0 Answers0