1

I have been playing around with Puppeteer settings and it seems that this error is not present when I set

puppeteer.launch({ headless:false });

The fetching only fails when I set headless: true which is ideally what I want.

Error:

Logging into user...
Logged in!
Fetching username...
TypeError: Cannot read properties of undefined (reading 'getProperty')
    at D:\boody\Programming\blxscrape\src\index.ts:40:27
    at Generator.next (<anonymous>)
    at fulfilled (D:\boody\Programming\blxscrape\src\index.ts:5:58)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

Snippet of the code:

  // username 
  await console.log('Fetching username...');
  const [el]: any = await page.$x('/html/body/div[3]/main/div[2]/div[2]/div/div[1]/div/div[2]/div[2]/div[1]/div[2]');
  let txt: any = await el.getProperty('textContent');
  const usernameTxt: string = await txt.jsonValue();

  // desc
  await console.log('Fetching description...');
  const [el2]: any = await page.$x('/html/body/div[3]/main/div[2]/div[2]/div/div[3]/div/div[1]/div[1]/div[2]/div/pre/span');
  const descTxt: string = await page.evaluate((el2: any) => el2.textContent, el2);

  // profile picture 
  await console.log('Fetching profile picture...');
  const [el3]: any = await page.$x('/html/body/div[3]/main/div[2]/div[2]/div/div[1]/div/div[2]/div[1]/span/thumbnail-2d/span/img');
  const src: any = await el3.getProperty('src');
  const pfpUrl: string = await src.jsonValue();

  const data: object = {usernameTxt, descTxt, pfpUrl};

  console.log(data);

  browser.close();
}

scrape('https://www.roblox.com/users/64004875/profile', process.env.USER, process.env.PASS);
Boody
  • 21
  • 3
  • Did you try setting a user agent as suggested in [Why does headless need to be false for Puppeteer to work?](https://stackoverflow.com/questions/63818869/why-does-headless-need-to-be-false-for-puppeteer-to-work/70936552#70936552)? As an aside, I caution against [browser-generated selectors](https://serpapi.com/blog/puppeteer-antipatterns/#misusing-developer-tools-generated-selectors). – ggorlen Apr 25 '23 at 03:45
  • I actually have. It did not work. – Boody Apr 25 '23 at 03:45
  • It did fix the problem of not getting blocked by a captcha though. – Boody Apr 25 '23 at 03:46
  • Thanks for clarifying. Did you try some of the more advanced ideas in the thread, like [this answer](https://stackoverflow.com/a/63820507/6243352)? I'd only open a new question if you've tried everything there and none of it worked for your use case. When you do ask, please state everything you've tried from the existing threads so we know where to begin working from. – ggorlen Apr 25 '23 at 03:47
  • 1
    Thank you for your answer! I have found that the second method in the post does not fit my needs. I found another solution though and it was by simply saving the login information. i.e: https://stackoverflow.com/questions/48608971/how-to-manage-log-in-session-through-headless-chrome – Boody Apr 25 '23 at 05:06
  • Actually doing further testing this did not completely fix the issue. since the website still detects that I am using a headless browser somehow, it logs me out and asks me to solve a captcha. Note that I am using two plugins (extra stealth & anonmize ua) to keep my browser relatively undercover. – Boody Apr 25 '23 at 05:26
  • Yeah, `userDataDir` is useful for bypassing logins but I've not seen it to be an effective solution to being detected as a bot. – ggorlen Apr 25 '23 at 05:29
  • try using `page.waitForXPath` instead of `page.$x` (https://pptr.dev/api/puppeteer.page.waitforxpath) – lezhumain Apr 25 '23 at 21:59
  • Is this all the information you need to scrape? because you don't need to login to the site for any of it. – idchi Apr 26 '23 at 20:01
  • From my experience I do need it for the user's about section. – Boody Apr 26 '23 at 21:44

0 Answers0