0

I'm very new to python and programming in general but have enrolled in a few courses to improve my knowledge. It seems it's quite important to have a 'goal' in mind when learning and one of mine is to successfully scrape and manipulate sports data.

I would like to scrape the results from https://www.britishhorseracing.com/racing/results/ but it looks like it's dynamically loading data via JS:

There looks to be a LOT of data here, results going back ~20 years plus multiple races for each racecourse on the day. From what I've read, selenium and beautifulsoup may offer some solutions here but before I start experimenting I wanted to check with you guys how realistic this goal is/ whether it's even achivable with how the website is structing the data and some pointers for how to get started?

Any help would be hugely appreciated.

Thanks

M. Sprout
  • 45
  • 7
  • Yes, selenium and beautifulsoup will make this data fairly easy to extract. Go through some tutorials and give it a try. – Alex Hall Mar 19 '18 at 13:59
  • i would suggest starting with smaller / easier projects and then expanding, maybe create a little html site your self and try to scrape that first – Pizza lord - on strike Mar 19 '18 at 14:05

1 Answers1

0

I'm not too familiar with Selenium or BeautifulSoup, but there are other JavaScript related web scrapers. Some I know are NightmareJS, PhantomJS, and ZombieJS (All horror related haha). NightmareJS runs off of and electron Chromium instance, PhantomJS is a javascript wrapper for selenium, and zombiejs is a raw node solution. I personally would recommend using NightmareJS.

However if you need to run NightmareJS on a server that is a whole different ball park. NightmareJS requires there to be graphics interface. There are modules that allow NightmareJS to be ran on a terminal instance however. If would would rather avoid that, then you should be fine installing PhantomJS on the server and use that.

With nightmare JS there is a scroll option that probably would trigger the rest of the data to load.

Here is an issue found of github. Some solutions are provided there.

If you would rather still use something like selenium or python, I'm pretty sure there ought to be some documentation describing how to scroll a page.

I was originally going to say you could use the API network call that BHA does by looking in the developer network tools, however looking at the API quick you need some authentication with the API.

Andrew Gremlich
  • 353
  • 4
  • 7