How can I dynamically scrape page data?

Question

I have been trying a few days now to get some data from a website that uses asmx post request to retrieve the data I want. I have tried with php curl, python and now with html parser and still had no luck... the post request is:

https://sports-itainment.biahosted.com/WebServices/SportEvents.asmx/GetEvents

{"champIds":["38"],"eventIds":[],"dateFilter":"All","marketsId":-1,"skinId":"betrebels"}

And after lot of tries, I found that this link provides me the data i want to get:

https://sports-itainment.biahosted.com/generic/prelive.aspx?token=&clientTimeZoneOffset=-180&lang=en-Gb&walletcode=508729&skinid=betrebels&parentUrl=https://ps.equalsystem.com/ps/game/BIASportbook.action#sportids=&catids=28&champids=91

but when I try to open it with curl or just simple parse it with simple_html_dom it doesn't show the data; I just displays some text.. Any idea how I can get it? I have over 50 files of trying different ways with no result so it would be difficult to post my code.

When you were trying Python, were you using the [requests](http://docs.python-requests.org/en/master/) module? — cosinepenguin, Jul 18 '17 at 15:54
What does "still had no luck" mean specifically? What's not working? What's happening? What would you expect to happen? Try posting an [MCVE](https://stackoverflow.com/help/mcve). — Mike, Jul 18 '17 at 15:57
Check out http://jmeter.apache.org/ - it's a bit hard to learn, but may be the right solution in your case — Y.L, Jul 18 '17 at 15:59
i tryied requests, session, json, PyQt4 etc. still no luck.. My last attempt was in php and i saw that the page that i get as response has the element in the code but as json, i tried to json_decode them and had a blanc page as result — Geraki, Jul 18 '17 at 15:59
Did you try using the python example below? Or are you set on using php? — cosinepenguin, Jul 19 '17 at 04:07

score 1 · Answer 1 · answered Jul 18 '17 at 16:19

I know this question is tagged as php, but it seems you are open to using Python as well so I hope this answer addresses your needs!

The issue you are running into is that the site is created dynamically (it loads after page load) so your previous attempts at loading the page in Python (with requests, as you say) worked, but did not actually return any data!

To scrape the site you link to in your question, I would highly recommend using the Python phantomjs module, paired with Selenium. This SO question has a few good answers on how to install phantomjs in Selenium. phantomjs allows the page to load fully (including the JS that actually populates it with the table information you want).

Then, once both of these dependencies are created, you can run this code:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.PhantomJS()
driver.get('https://sports-itainment.biahosted.com/generic/prelive.aspx?token=&clientTimeZoneOffset=-180&lang=en-Gb&walletcode=508729&skinid=betrebels&parentUrl=https://ps.equalsystem.com/ps/game/BIASportbook.action#sportids=&catids=28&champids=91')
soup = BeautifulSoup(driver.page_source)
soup.find_all('tbody')

And interact with the webpage with BeautifulSoup!

This is a good source of additional information if you need it!

scrape html generated by javascript with python

Hope it helps!

How can I dynamically scrape page data?

1 Answers1