-1

I'm trying to web scrap things to do in Cartagena, Colombia from trip advisor using puppeteer in javascript.

https://www.tripadvisor.com/Attractions-g297476-Activities-c42-Cartagena_Cartagena_District_Bolivar_Department.html

I want to grab the title of each listing, the links, the photos and the descriptions. However, I'm still stuck on trying to retreieve the listing titles. I'm not sure if I inputing the wrong data or what, but I need help fixing my code, please. Also, I know there's an trip adisor api. I don't want to use that so don't suggest that to me. I want to create my own scrapper, thank you.

Here's my javascript my code

Here's the error message I keep getting. error message

I want to web scrap the title of each listing, the links, the photos and the descriptions.

Webbie3
  • 3
  • 2
  • [Why should I not upload images of code/data/errors?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors) – ggorlen May 24 '23 at 05:02
  • Does this answer your question? [Why does jQuery or a DOM method such as getElementById not find the element?](https://stackoverflow.com/questions/14028959/why-does-jquery-or-a-dom-method-such-as-getelementbyid-not-find-the-element) – Raymond Chen May 24 '23 at 08:48

1 Answers1

0

You should be posting your code directly not images of code. Basically your selector is wrong.

To fix your code replace :

const places = await page.evaluate(() => Array.from(document.querySelectorAll('#lithium-root .jemSU'), (e) => ({title: e.querySelector('.VLKGO').innerText})));

with

const places = await page.evaluate(() => Array.from(document.querySelectorAll('article'), (e) => ({title: e.querySelector('.VLKGO a:not([class])').innerText})));

or with

let places = await page.$$eval('article .VLKGO span > div', el => el.map(x => x.textContent));

however if you want to get more info from the page you should do something like the following code (or use .map instead of the for loop)

const puppeteer = require("puppeteer");

let browser;
(async () => {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    let url = 'https://www.tripadvisor.com/Attractions-g297476-Activities-c42-Cartagena_Cartagena_District_Bolivar_Department.html';

    await page.goto(url, {waitUntil: 'load', timeout: 15000});
    await page.waitForSelector('main');

    let places = await page.$$('article');

    let data = [];
    for (let place of places) {
        // note removing 1., 2. from the name, if you need it intact just get el.textContent
        let header = await place.$eval('.VLKGO a:not([class])', el => { return {name : el.textContent.split('.').pop().trim(), link : el.getAttribute('href')}});

        let image = await place.$eval('picture > img[srcset]', el => el.getAttribute('srcset')); 
        image = image.split(',').pop().replace(/2x/gi, '').trim(); // get largest image, remove other link, remove 2x from the string.

        let desc = await place.$eval('a:not([class]) > div > span', el => el.textContent.trim());

        let by = await place.$eval('.VLKGO div > div > div > a', el => {return { name : el.textContent.replace('By ', '').trim(), link : el.getAttribute('href')}});

        let price = await place.$eval('[data-automation=cardPrice]', el => el.textContent);
        let priceTxt = await place.$eval('div:nth-child(1) > div:nth-child(3):not([class])', el => el.textContent);

        data.push({name: header.name, link: header.link, desc : desc, image: image, price:price, priceTxt : priceTxt, by : by});
    }

    console.log(data);
    await browser.close();

})().catch(err => console.error(err)). finally(() => browser ?. close());

idchi
  • 761
  • 1
  • 5
  • 15