0

I am a pythonist and very new to puppeteer and JavaScript. I am trying to webscrape a page and get some specific links out of that page and save those links in an array. I want this array to be outside the function and global. Here is my python code for this:

base_url ="https://www.blablabla.com"
links = []
for a in soup.find_all('a',attrs={'class':"o-job-card"}, href=True):
    links.append( base_url + a['href'] ) 

But my boss wants me to do the same thing with puppeteer. Anyhow I have come up with the solution below. But there is something wrong: I can console.log(my_links[i]) and see the links but I can not links.push(my_links[i]); and I do not understand why... Can somebody explain this to me?

Here is the whole code :

const puppeteer = require('puppeteer');
async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://www.blablabla.com');
  
    return await page.evaluate(() =>
      Array.from(document.querySelectorAll('a.o-job-card[href]'), (a) => a.getAttribute('href'))
    );

  } catch (err) {
    console.error(err);
  }
}

let links = [];
var txt = 'https://www.blablabla.com';
let userToken = main();
userToken.then(function (my_links) {
  for (i = 0; i < my_links.length; i++) {
    my_links[i] = txt + my_links[i];
    links.push(my_links[i]);
  }
});

console.log(links);
Ada
  • 5
  • 3
  • 1
    You are pushing to links, but the `console.log(links)` fired before `main()` was finished running. If you log `links` inside the `.then` you'll see it is populated. see: [How do I return the response from an asynchronous call?](https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call) – pilchard Jan 25 '21 at 12:31
  • Thanks for the explanation. But console.log comes after main() . how it can fire before? I am confused – Ada Jan 25 '21 at 13:08
  • Because it's an asynchronous function that returns a promise, not an array as you probably think. So you either need to put console.log() inside then callback function or await main() – pavelsaman Jan 25 '21 at 13:18
  • I need to read about async functions I believe. Thanks for help – Ada Jan 25 '21 at 13:57

0 Answers0