4

I've run through a lot of debugging, and read articles, and cannot figure out why I am getting a

at Promise.then (/workspace/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111) name: 'TimeoutError' }

For the "goto" line below, I've tried adjusting arguments, and tried going back in puppeteer versions in the package.json from version 5 to 4 to 3. The code runs fine locally, but in the Google Cloud Function keeps timing out. I verified that my VPC connector is working by writing a simple fetch function for google.com, so this is purely a Puppeteer in GCF issue.

FYI this is triggered on a PubSub topic.

const puppeteer = require('puppeteer')

const PUPPETEER_OPTIONS = {
  headless: true,
  args: [
    '--disable-gpu',
    '--disable-dev-shm-usage',
    '--disable-setuid-sandbox',
    '--no-first-run',
    '--no-sandbox',
    '--no-zygote',
    '--single-process',
    "--proxy-server='direct://'",
    '--proxy-bypass-list=*',
  ],
};

const closeConnection = async (page, browser) => {
  page && (await page.close());
  browser && (await browser.close());
};

exports.runScraper = async (message, context) => {
    const url = Buffer.from(message.data, 'base64').toString()
    console.log( `triggered with ${url}`)
    
    const browser = await puppeteer.launch(PUPPETEER_OPTIONS);
    const page = await browser.newPage();

    try // open url and get price and title
    {
        console.log( "awaiting goto")
        await page.goto(url, { waitUntil: 'networkidle2' })
        console.log( "awaiting evaluate")
        let item = await page.evaluate( async () => {
            let priceArray = document.querySelector('div.cAIbCF').innerText.split('.')
            return {
                title: document.querySelector('h1 > span').innerText,
                whole: priceArray[0],
                part: priceArray[1]
            }
        }) 
    } // try
    catch (error) {
        console.log( error );
        throw error;
    } finally {
        console.log( "finally closeConnection" );
        await closeConnection(page, browser);
        return;
    }
}
FaultyJuggler
  • 532
  • 1
  • 8
  • 29
  • Have you tried waiting for a different event? Basing on [this](https://stackoverflow.com/questions/52497252/puppeteer-wait-until-page-is-completely-loaded) we've got `load`, `domcontentloaded`, `networkidle0`. Considering you're using some specific selector later on, maybe it'd be enough to wait for a specific selector? Besides, maybe it's some GCP's machine issue - could you try executing a simple GET request (on URLs used in production, not necessarily google) using node's `https` instead of using puppeteer at all? – Marek Piotrowski Aug 22 '20 at 11:53
  • @MarekPiotrowski I added the selector in hopes it might fix the issue, first few runs was without the "waitUntil" – FaultyJuggler Aug 22 '20 at 14:12
  • @MarekPiotrowski and I've tried "load", and "dcomcontentloaded" but not "networkidle0" – FaultyJuggler Aug 22 '20 at 14:18

1 Answers1

0

I was having a kind of similar problem. I changed

await page.goto(url, { waitUntil: 'networkidle2' })

to

await page.goto(url, {
    waitUntil: 'load',
    timeout: 0
});

and it worked. Please feel free to use the same and tell whether it worked or not.

Sahil Patel
  • 51
  • 2
  • 4