I'm trying to scrape this website: http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp, but this page loads the contents of the table (probably through AJAX), after the page has been loaded.
My attempt:
import requests
from bs4 import BeautifulSoup, Comment
uri = 'http://www.fivb.org/EN/BeachVolleyball/PlayersRanking_W.asp'
r = requests.get(uri)
soup = BeautifulSoup(r.content)
print(soup)
But the div with the id='BTechPlayM'
remains empty, regardless of what I do. I've tried:
- Setting a timeout on the request:
requests.get(uri, timeout=10)
- Passing headers
- Using eventlet, to set a delay
- And the latest thing was to try and use the selenium-library, to use PhantomJS (installed from NPM), but this rabbit-whole just kept going deeper and deeper.
Are there a way to send a request to a URI, wait X seconds, and return the contents then?
... Or to send a request to a URI, keep checking if a div
contains an element; and only return the contents, whenever it does?