6

I am trying to write a simple script that given an arbitrary URL will return the title tag of that website. Because many of the URLs I want to resolve need to have JavaScript enabled, I need to use something like requests_html's render function to do this. However, I have encountered an issue with the library where the example URL below never terminates. I have tried the timeout arg of the render call and it did not work. Can anyone help me figure out how to get this to timeout properly or some other work around to make sure it doesn't get stuck?

This is my current code that does not terminate (it gets stuck on the render call):

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('http://shan-shui-inf.lingdong.works/')
# render with JS
r.html.render(sleep = 1, keep_page=True)
# Also does not work: r.html.render(sleep = 1, keep_page=True, timeout = 3)


title = r.html.find('title', first=True).full_text

I have already tried solutions like: Timeout on a function call and Python timeout decorator which still did not timeout strangely enough.

NOTE: I am using Python 3.7.4 64-bit on Windows 10.

Davie88
  • 93
  • 6
  • 3
    I'm having similar problem, for me it renders certain URL 30 times and then hangs but never timeouts. – tym Jun 16 '20 at 15:58
  • did you find any solution ? – Rasit aydin Dec 30 '20 at 18:39
  • This happens to me as well for a different url. I believe that the problem lies in the _async_render method, specifically when a TimeoutError is caught. In such a case, there is an attempt to perform `await page.close()` which never returns. possibly related to this [issue](https://bugs.python.org/issue40152) – PeNpeL Mar 15 '21 at 16:59

2 Answers2

0

I would suggest to put r.session.close() at last. This worked for me.

Richu
  • 1
  • 1
    Could you please show two results, one with `r.session.close() ` added and the other without? – Fanchen Bao Dec 15 '20 at 01:58
  • This is [after](https://drive.google.com/file/d/1GoyAykByJeNSCP1DZE431cNISarNLkp6/view?usp=sharing) I close the session at the end of loop. and [before](https://drive.google.com/file/d/1jNvQJLoLql7QEbuWgkgHc7zOAaKfYp0g/view?usp=sharing) where it just hangs after some loop. – Richu Dec 15 '20 at 12:42
  • I cannot access your link. It asks me to "Request access". Is it possible that you post it somewhere open to everyone? – Fanchen Bao Dec 16 '20 at 01:41
  • I think it will work now. I've changed the privacy to view everyone with the link. I just put r.session.close at end of the loop and it stopped hanging after some iterations. – Richu Dec 16 '20 at 01:57
0

Ok I'm quite late here,
This is what I've done:

pip install -U pyppeteer

(pip installed the 0.2.6 version for me)

Then it worked somehow

(unrelated)
If you want the Chromium browser to appear on the screen you'll need to change requests_html.py (somewhere in site-packages)'s 714th line, headless=True -> headless=False

aph
  • 225
  • 3
  • 12