Questions tagged [python-requests-html]

Requests-HTML is a Python HTTP library built around the requests API, adding support for parsing HTML (with optional headless-browser support to render JavaScript).

Official web site

Requests is a web scraping library written in Python under the MIT license.

This library intends to make parsing HTML (e.g. scraping the web) simple, building on top of the library for the HTTP layer. It supports XPath and CSS selectors, User Agent spoofing, and optional headless-browser support to execute JavaScript scripts on the page.

535 questions
15
votes
2 answers

Python: What happens if script stops while requests.get() is executing?

I know that requests.get() provides an HTTP interface so that the programmer can make various requests to a HTTP server. That tells me that somewhere a port must be opened so that the request can happen. Taking that into account, what would happen…
Fabián Montero
  • 1,613
  • 1
  • 16
  • 34
10
votes
3 answers

Trouble getting the trade-price using "Requests-HTML" library

I've written a script in python to get the price of last trade from a javascript rendered webpage. I can get the content If I choose to go with selenium. My goal here is not to use any browser simulator like selenium or something because the latest…
SIM
  • 21,997
  • 5
  • 37
  • 109
9
votes
1 answer

Python Requests-HTML Render() - No Content

I'd like to scrape a page, the content of which seems to be rendered by an app referenced in the html like:
I'm using the render() method from Requests-HTML python library like so: with HTMLSession()…
Dyneken
  • 111
  • 1
  • 2
  • 7
8
votes
4 answers

Why is the error "Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead"?

I'm running the code provided by @Dan-Dev in his answer. from requests_html import HTMLSession url = 'https://www.thefreedictionary.com/love' session = HTMLSession() r = session.get(url) r.html.render() lang_bar = r.html.find('#LangBar',…
Akira
  • 2,594
  • 3
  • 20
  • 45
6
votes
2 answers

Python requests_html render runs forever on certain URLs

I am trying to write a simple script that given an arbitrary URL will return the title tag of that website. Because many of the URLs I want to resolve need to have JavaScript enabled, I need to use something like requests_html's render function to…
Davie88
  • 93
  • 6
6
votes
1 answer

Scraping Javascript Website With BeautifulSoup 4 & Requests_HTML

I'm learning how to build another scraper for another website, Reverb.com, after getting my scraper on another website to work properly. Reverb, however, has been more challenging to extract information from and the model with my old scraper isn't…
6
votes
1 answer

requests-html "RuntimeError: There is no current event loop in thread 'Thread-1' when using it on a flask endpoint

I have a simple flask API with one endpoint that calls a method in another file to render some javascript from a site using request-html @app.route('/renderJavascript') def get_attributes(): return…
6
votes
3 answers

python3 SSL certificate problem when requests_html install chromium using pyppeteer

I'm running html.render() from requests_html library. It is trying to install chromium but I am getting an error I already tried pip install --upgrade certifi with and without sudo and got: Requirement already up-to-date: certifi in…
Jasem
  • 141
  • 1
  • 5
5
votes
1 answer

Requests-html results in OSError: [Errno 8] Exec format error when calling html.render()

I am using requests-html and trying the render function, with little success. When I run this script using python3.8 #!/usr/bin/python3 from requests_html import HTML file = "scrape/temp_file2.html" with open(file) as html_file: source =…
5
votes
2 answers

How to get raw html with absolute links paths when using 'requests-html'

When making a request using the requests library to https://stackoverflow.com page = requests.get(url='https://stackoverflow.com') print(page.content) I get the following: …
Mezo
  • 163
  • 1
  • 20
5
votes
1 answer

Python requests html printed response in array

I am trying to check if link contains http and print the URL. import requests from requests_html import HTMLSession import sys link = "http://www.tvil.me/view/93/4/8/v/%D7%90%D7%99%D7%99_%D7%96%D7%95%D7%9E%D7%91%D7%99_IZombie.html" enter_episodes =…
kaki
  • 103
  • 2
  • 9
4
votes
1 answer

Python requests response 403 forbidden

So I am trying to scrape this website: https://www.auto24.ee I was able to scrape data from it without any problems, but today it gives me "Response 403". I tried using proxies, passing more information to headers, but unfortunately nothing seems to…
4
votes
0 answers

requests_html + Django "There is no current event loop in thread 'Thread-1'."

Good, I have in Django this configuration in urls.py: urlpatterns = [ path('', views.index), ] and in views.py the index function where I pass through the variable a url, I have with requests_html: from django.http import HttpResponse from…
Javi
  • 188
  • 1
  • 9
4
votes
2 answers

Python requests-html: Render html with cookies

What I'm trying to do: from requests_html import HTMLSession with HTMLSession() as s: s.get('url', cookies=my_cookie_jar) s.html.render() print(s.html.html) I want to access a page where I need to log-in. I already logged in using a…
4
votes
1 answer

Why render / requests-html doesn't scrape dynamic content?

Long story short : switched from Selenium to Requests(-html). Works OK but not in every case. Page : https://www.winamax.fr/paris-sportifs/sports/1/1/1 Upon load it charges dynamic content with english games (example : Sheffield United - West…
jeremoquai
  • 101
  • 2
  • 10
1
2 3
35 36