3

I was trying to run playwright web automation on google colab but can't run the event loop on colab.

This is what I tried

!pip install playwright
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.firefox.launch(headless=True)
    page = browser.new_page()
    page.goto("https://www.google.com")

    page.wait_for_timeout(3000)
    browser.close()

which gave me error

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 33))

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-29-bc0f59648c4a> in <module>()
      1 from playwright.sync_api import sync_playwright
      2 
----> 3 with sync_playwright() as p:
      4     browser = p.firefox.launch(headless=True)
      5     page = browser.new_page()

/usr/local/lib/python3.7/dist-packages/playwright/sync_api/_context_manager.py in __enter__(self)
     44             raise Error(
     45                 """It looks like you are using Playwright Sync API inside the asyncio loop.
---> 46 Please use the Async API instead."""
     47             )
     48 

Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.

So I tried using the async API

import time
import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.new_page(storage_state='auth.json')
        await page.goto('https://www.instagram.com/explore/tags/alanzoka/')
        time.sleep(6)
        html = await page.content()

        time.sleep(5)

        # await browser.close()


asyncio.run(main())

But this gave me error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-34-c582898e6ee9> in <module>()
     27 
     28 
---> 29 asyncio.run(main())

/usr/lib/python3.7/asyncio/runners.py in run(main, debug)
     32     if events._get_running_loop() is not None:
     33         raise RuntimeError(
---> 34             "asyncio.run() cannot be called from a running event loop")
     35 
     36     if not coroutines.iscoroutine(main):

RuntimeError: asyncio.run() cannot be called from a running event loop

I need a working solution of setting up and using the playwright package on google colab.

Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93

2 Answers2

2

Not sure about Colab, but in a normal Jupyter notebook you do:

import nest_asyncio
nest_asyncio.apply()

Install with pip install nest-asyncio, and then you can run async stuff in a notebook.

Edit: You are also trying to run a GUI instance of Chrome, with that headless=False - change that to headless=True, Colab doens't run with a GUI.

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30
  • Doing this gives me other error `Attempt to free invalid pointer 0x29000020c5a0 ` – Himanshu Poddar Jul 22 '22 at 19:00
  • Maybe try to restart that Colab instance, or start a new one? basically, in a normal notebook, you add `nest_asyncio.apply()` right after imports, then you start writing your async code, and it can be run with no issues. – Barry the Platipus Jul 22 '22 at 19:04
  • `nest_asyncio` seems to affect performance, in my code it did. Downgrading `tornado` didn't, but there is extra annoyance since the runtime must be restarted. – Denny Ceccon Sep 05 '22 at 13:37
0

I just found this answer in another SO comment. I confirmed this works. https://stackoverflow.com/a/74518471/15898955

!apt install chromium-chromedriver

!pip install nest_asyncio
!pip install playwright

After you installed all the dependencies above, you can run the playwright script in Colab.

import nest_asyncio
nest_asyncio.apply()

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch_persistent_context(
            executable_path="/usr/bin/chromium-browser",
            user_data_dir="/content/random-user"
        )
        page = await browser.new_page()
        await page.goto("https://google.com")
        title = await page.title()
        print(f"Title: {title}")
        await browser.close()

asyncio.run(main())
gohaku
  • 31
  • 1