2

I am trying to extract data from URL: http://scores.espn.go.com/nba/scoreboard?date=20150410

<?php
include('simple_html_dom.php');

function dlPage($href) {

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$str = curl_exec($curl);
curl_close($curl);

$html= str_get_html($str); 

foreach($html->find(div[id=events]) as $elm){
    var_dump($elm->plaintext);exit;
    // this var_dump is return empty string.
}

return $dom;
}

$url = 'http://scores.espn.go.com/nba/scoreboard?date=20150410';
$data = dlPage($url);
print_r($data);

?>

whenever I tried to access internal article tags, I always get null or empty array. Please help me how can I access and extract the data inside the article html5 tags of match scores. enter image description here

Farjad Hasan
  • 3,354
  • 6
  • 23
  • 38
  • Maybe look at using XPATH. Find it the easiest approach usually. http://php.net/manual/en/simplexmlelement.xpath.php / http://php.net/manual/en/class.domxpath.php Google chrome makes it very easy to retrieve. `//*[@id="teams"]/tr[2]/td[6]/span/text()` – ficuscr Apr 17 '15 at 19:42
  • You are looking for a `div` with and ID of `events`. That exists in the page but it is indeed an empty element, at least on page load. It might get filled using ajax, but you will not get that information when you use cURL to get the page. Or any other method that does not parse the page and execute the javascript. – jeroen Apr 17 '15 at 19:44
  • @jeroen bcoz the data I am trying to extract is in article tags which lies inside div with id=events – Farjad Hasan Apr 17 '15 at 19:47
  • @jeroen oh I see, then is there any alternate solution? – Farjad Hasan Apr 17 '15 at 19:48
  • If they don't offer an API, I'm afraid not. You could of course try to see what ajax call they make and try to do that directly in cURL but that should not work if they are any good at their job :-) – jeroen Apr 17 '15 at 19:50
  • @ficuscr can u please explain to me in little detail how are u using Google Chrome for this purpose? Sorry, I am new to web scrapping. – Farjad Hasan Apr 17 '15 at 19:50
  • It's generated by js – hytest Apr 17 '15 at 19:53
  • I think they are running some kind of cron that update data every 1 minute or so. – Farjad Hasan Apr 17 '15 at 19:58
  • @NuttyProgrammer re. xpath see: http://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome and say http://stackoverflow.com/questions/13718500/using-xpath-with-php-to-parse-html – ficuscr Apr 17 '15 at 20:14

1 Answers1

1

You are looking for a div with and ID of events. That exists in the page but it is indeed an empty element, at least on page load. It gets filled using ajax, but you will not get that information when you use cURL to get the page. Or any other method that does not parse the page and execute the javascript.

However, you are in luck. They are making an ajax call to:

http://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard?calendartype=blacklist&dates=20150410

And you can easily do the same.

It will get you the information as a json string but that is easy to parse using json_decode in php. Afterwards you will have a nested object or array and you can display the data as you please.

Please note: I don't know if you are allowed to do that so how you use this, is up to you. You could try and search on their site if they offer their API publicly and what the conditions are.

jeroen
  • 91,079
  • 21
  • 114
  • 132
  • but I believe that there should be some kind of authentication to access their api. – Farjad Hasan Apr 17 '15 at 20:02
  • @NuttyProgrammer Nope, I tried it from another browser before I posted this and without opening the web-page there first, I get the complete json. They could be doing something that is ip based but you would have to try that. – jeroen Apr 17 '15 at 20:02
  • @NuttyProgrammer Just tried it from my phone on another internet connection and it opens without any form of authentication. – jeroen Apr 17 '15 at 20:06