-2

I'm really new to web stuff so plz forgive my noobness. I am trying to make website that takes data from a British cycling event then analyses it. The main trouble that I'm having is that to get the table, you have to click on a button "view entrants" which I think runs a JavaScript which brings up the table. So how would I go about scraping the data from a given event?

Thanks in advanced

Here is an example: https://www.britishcycling.org.uk/events/details/141520/London-Dynamo-Summer-Road-Race-2016

Red
  • 393
  • 2
  • 5
  • 9

1 Answers1

0

Make sure that britischcycling.org.uk is allowing you to scrape the data.

Then: The URL contains the eventId, in your example it will be 141520. With that eventId that website requests this URL: https://www.britishcycling.org.uk/events_version_2/ajax_get_organisation_events?event_id=141520

As you can see the 141520 number is all what will change.

The problem is that you will receive a full HTML page. Without the contents you are seeking. By adding the X-Requested-With: XMLHttpRequest header you will receive the right data.

Here is the PHP-code (generated with Postman):

<?php

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => "https://www.britishcycling.org.uk/events_version_2/ajax_get_organisation_events?event_id=146685",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 30,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => "POST",
  CURLOPT_HTTPHEADER => array(
    "cache-control: no-cache",
    "x-requested-with: XMLHttpRequest"
  ),
));

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
  echo "cURL Error #:" . $err;
} else {
  echo $response;
}
Wesley Abbenhuis
  • 692
  • 13
  • 19
  • Thanks, but (probably stupid question) how did you get that URL because I couldnt find anything like that in the source for the button? – Red Jul 19 '16 at 14:26
  • With Chrome, you can open Developers tools (CTRL+SHIFT+I). Then go to the Network tab, filter XHR request. Reload the page and then click the link to open the events. Then you will see a request. Click the request for the details. – Wesley Abbenhuis Jul 19 '16 at 14:28
  • Would you happen to know why file_get_contents() errors with this url? It says "failed to open stream" – Red Jul 19 '16 at 17:05
  • This Question had the same problem \w solution http://stackoverflow.com/questions/697472/why-file-get-contents-returns-failed-to-open-stream-http-request-failed – Wesley Abbenhuis Jul 20 '16 at 06:24