Scraping JavaScript page with python requests and asyncio in Jupyter notebook

Question

This is far from duplicate of this question since that question doesn't even use requests for scraping but only for session and getting page content. I use it as well in such manner with Beautiful Soup.

I have also tried this. But they also didn't explain how to use requests effectively for getting JavaScript content.

I'm trying to scrape info from web page which is rendered by JavaScript code. I'm using requests module in Jupyter notebook which.

When I use following sample code:

import asyncio
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()

r = await asession.get('http://python-requests.org')
r.html.render()
r.html.search('Python 2 will retire in only {months} months!')['months']

I get error:

RuntimeError: This event loop is already running

I need some advice on how to implement this comment to make it work since when I type in Jupyter notebook :

asyncio.get_event_loop()

I get:

<_WindowsSelectorEventLoop running=True closed=False debug=False>

so I need way to use existing loop in Jupyter notebook.

Possible duplicate of [Using python requests and beautiful soup to pull text](https://stackoverflow.com/questions/39757805/using-python-requests-and-beautiful-soup-to-pull-text) — simkusr, Oct 29 '19 at 05:48

score 1 · Answer 1 · answered Oct 29 '19 at 07:29

1

I'm not much familiar with asyncio but I belive you are supposed to await your function if you are using AsyncHTMLSession

from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()

async def get_results():
    r = await asession.get('http://python-requests.org')
    return r

a = asession.run(get_results)
print(a[0].html.search('Python 2 will retire in only {months} months!')) # return None because text is not present there

without AsyncHTMLSession

from requests_html import HTMLSession
session = HTMLSession()

r = session.get('http://python-requests.org')

r.html.render()

print(r.html.search('Python 2 will retire in only {months} months!')) # None

answered Oct 29 '19 at 07:29

P.hunter

1,345
2
21
45

Thank you, see update on my question. Do you know how can I use existing async loop in Jupyter notebook? – Hrvoje Oct 29 '19 at 09:35
@Harvey hi, do you mean like accessing a current running process in a jupyter notebook? – P.hunter Oct 29 '19 at 10:31
Yes, see this comment : https://github.com/jupyter/notebook/issues/3397#issuecomment-381440076 – Hrvoje Oct 29 '19 at 10:34

Scraping JavaScript page with python requests and asyncio in Jupyter notebook

1 Answers1