This is not like the "duplicate" which explains alternatives to regexp, but doesn't explain a solution to this problem.
I am trying to use preg_match_all
to parse a scraped page (http://www.sportsbookreview.com/betting-odds/). I have tested my regexp />([A-Z]+) - /
on two sites (http://www.phpliveregex.com/ and functions-online.com/preg_match_all.html) and it works in both cases. I have also pasted the snippet I am parsing directly into my code. In all those cases, it works, but when I run it on live data, it returns no results.
My only theory is that there is a hidden character in the site that doesn't copy when I cut and paste into the live testing sites.
The full code is below. Thanks for your help.
<?php
function curl($url) {
$curlAgent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $curlAgent);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$strUrl = 'http://www.sportsbookreview.com/betting-odds/';
$data = curl($strUrl);
$strGames = explode('@id',$data);
echo "<br>Number of games on page: ".count($strGames)."<br>";
for ($i = 1; $i < count($strGames); $i++) {
// echo $strGames[$i];
$clean = preg_replace('/[^\PC\s]/u', '', $strGames[$i]);
$error = preg_match_all("~>([A-Z]+) - ~m", $clean, $strTeams);
var_dump($strTeams);
}
?>