How to get the content of a div tag when scraping with puppeteer and NodeJs

Question

I heard of this library called puppeteer and is usefulness in scraping web pages. so i decided to scrape a gaming site content so I can store it data and go through it later.

But after i copied the XPATH of the div tag I want puppeteer to scrape it content, its returning Empty string Please what am I doing wrong.

This is the url am trying to scrape here

i want to scrape the div tag where the result of the 6 different color ball are being displayed. so i can get the number of those colors every 45 seconds.

const puppeteer = require("puppeteer");

async function scrapeData(url){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);

const [dataReceived] = await page.$x('/html/body/div[1]/div/div/div/footer/div[2]/div[1]/div/div[1]/div[2]/div/div');
const elContent = await dataReceived.getProperty('textContent');
const elValue = await elContent.jsonValue();
console.log({elValue});
//console.log(elContent);
//console.log(dataReceived)
browser.close();
}
scrapeData("https://logigames.bet9ja.com/Games/Launcher?gameId=11000&provider=0&sid=&pff=1&skin=201");
console.log("just testing");

score 1 · Answer 1 · answered Jan 08 '21 at 02:51

1

Rather than using page.$x here, you could use a simpler selector, which would be less brittle. Try page.$('.ball-value'), or possibly page.waitForSelector('.ball-value') to deal with transition times. Testing on that page using a simpler selector seems to work. If you want to get all the ball values rather than just the first one, there's page.$$ (which is the same as document.querySelectorAll, so it would return an array of elements).

answered Jan 08 '21 at 02:51

Zac Anger

6,983
2
15
42

Tanks for your help @Zac Anger, i tried all the solution you provided ad they work, But they only work for normal static html element on that page. the elements and the attributes that are dynamically created with some kind of script those not work. – Bill Mayheptad Ritchie Jan 09 '21 at 01:41
EG: when i do page.$$('.ball-value'); it work because the class .ball-value is a hardcoded attribute to one the selected div tags. but when i try to getProperty InnerHtml of the div, it returns '
' whereas the class="ball ball-green", but the green is dynamically added every 45seconds. same thing happen if try to get textContent of the div which is the ball number, it will return empty string. since the class ball-green and the text content are dynamically added every 45 seconds. please did you have any other solution for this ?? – Bill Mayheptad Ritchie Jan 09 '21 at 01:54
I'm not sure if it would work, but you could try watching for changes as in [this question](https://stackoverflow.com/questions/50392736/puppeteer-how-to-listen-for-on-innerhtml-change) or [this one](https://stackoverflow.com/questions/12421491/can-i-monitor-changes-on-an-html-div). – Zac Anger Jan 09 '21 at 02:58

How to get the content of a div tag when scraping with puppeteer and NodeJs

1 Answers1