Pyppeteer: how to extract text from div element on Linux (Ubuntu 16.04)?

Question

Given HTML snippet: <div id="gs_ab_md"><div class="gs_ab_mdw">About 3,260 results</div></div>. It works on Windows however it does not work on Linux (Ubuntu 16.04). I've already installed following extra packages mentioned here: Why does this pyppeteer code only work on windows? . Any idea?

import pyppeteer
from pyppeteer import launch

async def main():
    browser = await launch({
        'headless': True
    })
    page = await browser.newPage()
    await page.goto('WEBPAGE_URL')
    element = await page.querySelector('#gs_ab_md .gs_ab_mdw')
    title = await page.evaluate('(element) => element.textContent', element)
    print(title)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Execution ends with the following error:

Traceback (most recent call last):
  File "p.py", line 16, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "p.py", line 7, in main
    browser = await launch({'headless': True})
  File "/home/developer/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 306, in launch
    return await Launcher(options, **kwargs).launch()
  File "/home/developer/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 167, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/home/developer/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 226, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

This isn't exactly a [mcve] -- having the website you're scraping is important for reproducibility, otherwise the code is pretty much the same as the link; it doesn't do anything particularly unusual that'd account for the error. As an aside, I suggest `await page.waitForSelector('#gs_ab_md .gs_ab_mdw')` to ensure your target element is actually on the page before trying to query it. That said, this safety precaution wouldn't result in the error you're showing. — ggorlen, Aug 21 '21 at 15:15

score 0 · Answer 1 · answered Aug 13 '21 at 14:40

0

Hey this is from the documentation maybe it will help you.

https://miyakogi.github.io/pyppeteer/

Example to get element’s inner text:

element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)

answered Aug 13 '21 at 14:40

DaveMier88

50
5

Hi DaveMier88, Thanks I am aware of the documentation, what would you enter in my case instead of h1? – cloudSAPiens Aug 14 '21 at 05:29
Div is for grouping elements where is the element? – DaveMier88 Aug 14 '21 at 12:13

score 0 · Answer 2 · answered Aug 25 '21 at 18:57

In my case this was a browser compatibility issue. For some whatever reason the automatically installed chrome package by pyppeteer did not work on my linux machine, so i updated my chromium browser and used that instead

Make sure you add in your browser initialization parameters executablePath like this

browser = await launch(
    headless = True,
    executablePath = '/usr/bin/chromium-browser' #or your path to chromium
)

If it still does not work with this, try installing different versions of chromium, I use the latest pyppeteer version and Chromium version 88 if that helps

Pyppeteer: how to extract text from div element on Linux (Ubuntu 16.04)?

2 Answers2