0

I am running using Docker and docker-compose to run a NodeJS scrapper with puppeteer. My troubleshooting is as follows:

EXPECTED OUTPUT: Im getting the title page from wikepedia

CASE 1: When running function1() with await the process stops

OUTPUT: 
Browser is running
//console.log("function1() end" ) does not execute

Case2: If function1() has no await the function does not execute but the console.log after is executed

OUTPUT:
Browser is running
function1() end

How can I run the function1 using await and get the title of page output.


async function function1() {
  let page = await browser.newPage()
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
  )
  await page.goto(
    'https://en.wikipedia.org/wiki/Main_Page',
    {
      waitUntil: 'networkidle2',
    },
  )
  console.log(await page.title())
}


async function looper() {
  await function1()
  await console.log('function1 end')
}

async function startPuppeteer() {
  browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  })

  console.log('Browser is running')
  setInterval(looper, 30000)

}

Dockerfile


FROM buildkite/puppeteer:latest

USER root

COPY . /app

RUN cd /app && npm install 

EXPOSE 8000

WORKDIR /app

CMD npm run start

Docker-compose.yml

version: "3.9"
services:
  web:
    build: .
    ports:
      - "8000:8000"

Joshua Santiago
  • 127
  • 3
  • 12
  • Can you show the function1 and function2 code? How do you know they're not executed? Do you have console.log in those function? – Molda Dec 21 '21 at 16:26
  • there is a console.log after starting headless chrome which always runs. The scrapper functions execute without docker, when using docker the await function1 and await function2 is not executed. Additionally, I edited the post to show the console.logs – Joshua Santiago Dec 21 '21 at 16:35
  • Well there must be something in the function1 and/or function2 which prevents it from running. Without seeing the code it's impossible to help. – Molda Dec 21 '21 at 16:45
  • I added the main function, thank you for the assistance – Joshua Santiago Dec 21 '21 at 17:09

1 Answers1

0

Using puppeteer did not work while deploying the scrapper using chrome in headless. I had to switch to heroku and implement webpacks. Using the help from this answer: https://stackoverflow.com/a/55090914/7696261 I was able to deploy the service.

Joshua Santiago
  • 127
  • 3
  • 12