0

I have the following:

const express = require('express');
const app = express();
const port = 3000;

const rp = require('request-promise');
const cheerio = require('cheerio');

const options = {
    uri: `https://www.zoopla.co.uk/for-sale/details/51403409?search_identifier=620f645adbd75033ae2faf2cffdcfc3a`,
    headers: {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
    },
    transform: function (body) {
      return cheerio.load(body);
    }
};

rp(options)
    .then(($) => {
        for (var i = 0; i < $('img').length; i++) {
            console.log($('img')[i].attribs.src)
        }
    })
    .catch((err) => {
        console.log(err);
    });

app.get('/', (req, res) => res.send('Hello World!'));

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

Although the class is in the mark up the array under the page /results is returning empty.

I am thinking that the img classes are added dynamically that's possibly why are not getting picked up.

Is there a way around that if that's the case?

UPDATE

I have updated the code however I am getting not the expected result:

enter image description here

I would expect to find urls of .png or .jpg

2nd UPDATE

enter image description here

Aessandro
  • 5,517
  • 21
  • 66
  • 139
  • Have you tried logging `$('.gallery-thumbs-list img')` out to the console to see what it's returning. If you look at the page source there are elements that conform to `.gallery-thumbs-list img` so it doesn't appear that they are being dynamically loaded (at least not all of them anyway) – Melbourne2991 May 16 '19 at 10:19
  • If I run this in the node console 'console.log($('img').length)' I get 1 and it refers to: http://media.rightmove.co.uk/sitelogo.gif – Aessandro May 16 '19 at 11:01
  • The web server detected that you are not using a browser to load the page content. For me, it served an error page without any `.gallery-thumbs-list` elements. Using a `User-Agent` header for the request should make it work – Carsten May 16 '19 at 11:14
  • I meant console.log in the node process. @corschdi Ah yep that's probably the issue. – Melbourne2991 May 16 '19 at 11:21
  • I have updated the code with the user agent but it doesn't seem I am getting a list of all the images url/ src attributes – Aessandro May 16 '19 at 13:39
  • I have disabled JavaScript and and only 1 image with that class is coming through the HTML so the content for that images is dynamic. I guess I will need to find a way to load it after. – Aessandro May 16 '19 at 15:34
  • I see them in `.dp-gallery__list-item img` – pguardiario May 16 '19 at 23:12
  • @pguardiario if I console.log your suggestion I see the object (screenshot attached under 2nd update) but I don't see any img tags or related to that. Also this ($('.dp-gallery__list-item img').length) results in returning 0 – Aessandro May 17 '19 at 08:43
  • There's two things, in chrome element inspector do ctrf-f and paste that selector, you will see 9 of them. If cheerio can't find them though you might be getting a bad response. – pguardiario May 17 '19 at 23:20

0 Answers0