0

Can you see where my regular expression to match the longitude and lattiude from some scraped html is going wrong?

The script should work by using file_get_contents to load some html then use a regular expression and preg_match to extract the lattitude and longitude. Currently the script below is outputting blank for both latitude and longitude and I'm not sure quite what is wrong and regular expressions are not a very strong area for me. Thanks.

$url = 'http://www.homebase.co.uk/webapp/wcs/stores/servlet/StoreLocatorFlow?slsid=658';

$scrapedPage = file_get_contents($url);

the returned html has a line in it as follows:

<p class="geo"> <abbr class="latitude" title="52.19166">52.19166</abbr> <abbr class="longitude" title="-2.23108">-2.23108</abbr> </p> 

We then do preg_match:

preg_match('/class="latitude"\s*title="([^"]+)"/', $scrapedPage, $lat);
preg_match('/class="longitude"\s*title="([^"]+)"/', $scrapedPage, $lon);

echo '<latitude>'.$lat[1].'</latitude>';
echo '<longitude>'.$lon[1].'</longitude>';
Ben Paton
  • 1,432
  • 9
  • 35
  • 59
  • http://3v4l.org/TSiqn Works for me. There must be something in `$scrapedPage` that's breaking it. – Ben Fortune Nov 04 '13 at 16:13
  • 2
    seems to be working fine for me too though you should use DOM to parse it. – anubhava Nov 04 '13 at 16:13
  • 1
    Link in the OP redirects to an error page, are you sure you're getting the data you want? – Ben Fortune Nov 04 '13 at 16:16
  • Hmm thinking you need some auth cookie or a session before you can hit that url. So issue it seems is not with the regular expresion but with accessing the url. – Ben Paton Nov 04 '13 at 16:31
  • 1
    Ben's comment is probably the key. I suspect you haven't tried `var_dump($scrapedPage)` and you're assuming that the download went fine. I'd say the remote site is doing a simple referrer check. – Álvaro González Nov 04 '13 at 16:31
  • Thinking I solved it. I needed to have some session cookies set. This method of using curl discussed here works http://stackoverflow.com/questions/13210140/how-can-i-scrape-website-content-in-php-from-a-website-that-requires-a-cookie-lo – Ben Paton Nov 04 '13 at 17:15
  • 1
    This question appears to be off-topic because it is not about HTML parsing. – Álvaro González Nov 05 '13 at 08:10
  • This question appears to be off-topic because there is no problem to solve here. The OP's problem turned out to be elsewhere; thankfully, he found a solution. – Wayne Conrad Mar 19 '14 at 21:39

0 Answers0