0

I am trying this code to extract an h4 which a child of 7 parent divs. like parent div, grandparent div from this website. The h4 is Claimed.

It doesnt work because h4 isnt received.

const puppeteer = require('puppeteer')
async function run() {

    const browser = await puppeteer.launch({
        headless: false,
        ignoreHTTPSErrors: true,
    })

    var x = 1101;
    while (x !== 0) {
        const page = await browser.newPage();
        await page.setRequestInterception(true);

        page.on('request', (req) => {
            if (req.resourceType() == 'image' || req.resourceType() == 'font') {
                req.abort();
            }
            else {
                req.continue();
            }
        });
        page.setDefaultTimeout(0);
        await page.goto(`https://play.projectnebula.com/planet/${x}`);
        await page.waitForSelector('h4');
        const elements = await page.$$("h4");

        let text = await (await elements[elements.length - 2].getProperty("innerText")).jsonValue()
        text = text.toLowerCase().trim();
        if (text == 'claimed' || text == 'on sale') {
            console.log(text, x)
            x += 1;
            await page.close();
        }
        else {
            console.log(x, text)
            x = 0
        }
    }
}
run();

I am trying to find an unclaimed planet, the unclaimed and claimed both are in h4. After working on 1 or 2 URLs. The code stops even though the planet is claimed because the h4 of claimed isnt fetched.there are almost 13 h4 in these pages. For first or second url all 13 are fetched but for next URL only 11 are fethced

  • You need to `await` all Puppeteer calls that manipulate the browser. For your second part, canonical is [Why does headless need to be false for Puppeteer to work?](https://stackoverflow.com/questions/63818869/why-does-headless-need-to-be-false-for-puppeteer-to-work) – ggorlen Dec 04 '22 at 08:08
  • `page.waitForSelector('h4');` before your first `goto` doesn't make sense. Navigate first, then wait for things to show up. – ggorlen Dec 04 '22 at 08:14
  • @ggorlen it doesnt work h4 is still not received – Hafeez Ali Dec 04 '22 at 16:58
  • It [still](https://stackoverflow.com/questions/74666569/how-can-i-access-a-h4-element-with-no-class-but-is-a-child-of-div-and-div-is-als#comment131792245_74666569) works for me if I run this headfully and use `waitForSelector("h4")` after the goto. – ggorlen Dec 04 '22 at 17:06
  • @ggorlen it works for 2 or 3 links after that it doesnt – Hafeez Ali Dec 05 '22 at 01:44
  • In what way does it fail? – ggorlen Dec 05 '22 at 01:46
  • @ggorlen after going through 1 or 2 urls it doesnt fetch h4 **claimed** While it fetches other h4 – Hafeez Ali Dec 05 '22 at 01:54
  • You mean it falls into the `else` branch? I'm not entirely sure what your goal is, I guess. What's the script supposed to do? You want to loop forward starting from planet 1101 until you see a `claimed` or `on sale` planet? – ggorlen Dec 05 '22 at 01:57
  • @ggorlen yes. I am trying to find unclaimed planets. If a planet is not claimed then I want it to show me the planet which is not claimed but the problem is. The planet is claimed but the code stops and shows me the page because **claimed** h4 isnt fetched – Hafeez Ali Dec 05 '22 at 02:01
  • using .then() after wait for selector solved the problem – Hafeez Ali Dec 05 '22 at 02:18
  • Hmm. It's not good practice to mix `then` and `async`/`await`, so there should be no reason for `then` here--`await` does the same thing. – ggorlen Dec 05 '22 at 02:34
  • `await` and `then` are totally equivalent, just different syntactical ways to do the exact same thing. So it's cool you figured it out with `then`, but you could probably write whatever you wrote more clearly with `await`. Consider posting a [self answer](https://stackoverflow.com/help/self-answer) with your final solution. I'll offer feedback if I see it. – ggorlen Dec 05 '22 at 07:53
  • I saw your solution. The only functional difference is `waitForNetworkIdle`, not the `then` you added which is superfluous. – ggorlen Dec 05 '22 at 08:00

1 Answers1

0

After await waitforselector() use await page.waitForNetworkIdle();

while (x !== 0) {
  const page = await browser.newPage();
  // we need to enable interception feature
  await page.setRequestInterception(true);

  page.on('request', (req) => {
      if (req.resourceType() == 'image' || req.resourceType() == 'font') {
          req.abort();
      }
      else {
          req.continue();
      }
  });
  page.setDefaultTimeout(0);
  await page.goto(`https://play.projectnebula.com/planet/${x}`);
  await page.waitForSelector('h4');
  // await page.waitForSelector('h4').then(async() => {
  await page.waitForNetworkIdle();
  const elements = await page.$$("h4");
  let text = await (await elements[elements.length - 2].getProperty("innerText")).jsonValue()
  text = text.toLowerCase().trim();
  if (text == 'claimed' || text == 'on auction' || text == 'on sale') {
      console.log(text, x)
      x += 1;
      await page.close();
  }
  else {
      console.log(x, text)
      x = 0
  }
}
Tyler2P
  • 2,324
  • 26
  • 22
  • 31