There is a couple of things wrong here, but you are on a good path of getting this to work. The main problem is, that you can't have await
within a try {} catch {}
block. Asynchronous JavaScript has a different way of dealing with errors. See: try/catch blocks with async/await.
In your case, it's totally fine to write everything in one async function. Here is how I would do it:
async function scrapeIfc() {
const completeData = [];
const url = 'https://www.ifc.org/wps/wcm/connect/news_ext_content/ifc_external_corporate_site/news+and+events/pressroom/press+releases';
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
await page.setDefaultNavigationTimeout(0);
const links = await page.evaluate(() =>
Array.from(document.querySelectorAll('h3 > a')).map(anchor => anchor.href)
);
for (const link of links) {
const newPage = await browser.newPage();
await newPage.goto(link);
const data = await newPage.evaluate(() => {
const titleElement = document.querySelector('td[class="PressTitle"] > h3');
const contactElement = document.querySelector('center > table > tbody > tr:nth-child(1) > td');
const txtElement = document.querySelector('center > table > tbody > tr:nth-child(2) > td');
return {
source: 'ITC',
title: titleElement ? titleElement.innerText : undefined,
contact: contactElement ? contactElement.innerText : undefined,
txt: txtElement ? txtElement.innerText : undefined,
}
})
completeData.push(data);
newPage.close();
}
await browser.close();
return completeData;
}
There is couple of other things you should note:
- You have a bunch of unused import
title
, link
, resolve
and reject
the head of your script, which might have been added automatically by your code editor. Get rid of them, as they might overwrite the real variables.
- I changed your
document.querySelector
s to be more specific, as I couldn't select the actual elements from the ITC website. You might need to revise them.
- For local development I use Google's functions-framework, which helps me to run and test the function locally before deploying. If you have errors on your local machine, you'll have error when deploying to Google Cloud.
- (Opinion) If you don't need Firebase, I would run this with Google Cloud Functions, Cloud Scheduler and the Cloud Firestore. For me, this has been the go-to workflow for periodic web scraping.
- (Opinion) Puppeteer might be overkill for scraping a simple static website, since it runs in a headless Browser. Something like Cheerio is much more lightweight and much faster.
Hope I could help. If you encounter other problems, let us know. Welcome to the Stack Overflow community!