-2

Hi I am very new to screen scraping. I am trying to scrape reviews from a hotel booking webite to display on to mine..

Ive got this far but got a bit stuck. Can anyone help?

<?php 
$data = file_get_contents('http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
$regex = '/<div id="summary">
(.+?)</div>/';
preg_match($regex,$data,$match);
var_dump($match); 
echo $match[1];
?>
  • Ask them for an API, otherwise they probably don't want you taking their reviews. – 472084 Aug 14 '12 at 14:41
  • screen-scraping is **NOT** a good idea..the content in the website you are scraping keeps changing constantly in terms of semantics and internals and your site will break all over the place, making it look really bad. Additionally you can have legal issues. Better look for sites/services with APIs or RSS or someother ways of syndication as Jleagle suggested. – raidenace Aug 14 '12 at 14:45
  • The canonical answer for anything relating to html + regexes: http://stackoverflow.com/a/1732454/118068 – Marc B Aug 14 '12 at 14:48
  • I am an affiliate of this website. Their api currently doesnt support guest reviews and I have permission to scrape I just dont know how to do it – Westfield Sandown Aug 14 '12 at 14:51
  • possible duplicate of [Screen scrape using php and fopen](http://stackoverflow.com/questions/11957038/screen-scrape-using-php-and-fopen) – Jürgen Thelen Aug 15 '12 at 22:33

1 Answers1

1

use DomDocument

<?php
  define('URL', 'http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
  $doc = new DOMDocument();
  $doc->loadHTML(file_get_contents(URL));
  $summary = $doc->getElementById('summary');
  // also have $doc->getElementsByTagName , etc
  var_export($summary);
?>

Also, for more complicated queries you should consider looking into XPATH (uses jQuery-like syntax)

Stefan
  • 3,962
  • 4
  • 34
  • 39
  • Thanks for your reply! Seem this returns several errors: Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 9 in /homepages/************/hotel.php on line 377Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 9 in /homepages/28/d282373443/htdocs/eurobooker/hotel.php on line 377 – Westfield Sandown Aug 14 '12 at 14:55
  • DOMElement::__set_state(array( )) – Westfield Sandown Aug 14 '12 at 14:57
  • depends if the file you are loading has errors or not. There are different load methods: loadHTML, loadXML, load. Check them out and try the simple examples and see if that is the functionality you need. When you get a grip of it you can apply it on a real-live webpage Generally though, it has to be valid HTML for it to wark – Stefan Aug 14 '12 at 15:10
  • I changed it to load.. there are no errors now.. it just prints NULL – Westfield Sandown Aug 14 '12 at 15:19