Screen scrape using php and fopen

Question

Possible Duplicate:
Screen scapingin in php using file_get_contents

Can anyone help me.. I am trying to scrape Hotel reviews from LateRooms.com dont tell me its a bad idea because I already have permission as an affiliate

My code:

<?php
header('content-type: text/plain');

$contents = file_get_contents('http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx');
$contents = preg_replace('/\s(1,)/', ' ', $contents);

print $contents . "\n";

$records = preg_split('/<div id="review/', $contents);

for ($ix = 1; $ix < count($records); $ix++) {

$tmp = $records[$ix];

preg_match('/id="review"/', $tmp, $match_reviews);

print_r($match_reviews);

exit();

}
?>

This works really well the only problem is that It pulls in the whole page of code and doesnt match the div id 'review'

Thanks in advance

Don't use regex for parsing HTML. Use a DOM parser. http://stackoverflow.com/a/1732454/362536 — Brad, Aug 14 '12 at 16:48
If you're an affiliate, can't you just ask them to provide you an API of some sort? — Waleed Khan, Aug 14 '12 at 16:49
why dont you use simplephpdom opensource code, this tool makes your life really easier to scrap any content you specify — WatsMyName, Aug 14 '12 at 16:49

Abhishek · Accepted Answer · 2012-08-14T17:57:26.987

function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$data = curl_exec($ch);
curl_close($ch);

return $data;
}
function DOMinnerHTML($element){ 
$innerHTML = ""; 
$children = $element->childNodes; 
foreach ($children as $child) 
{ 
    $tmp_dom = new DOMDocument(); 
    $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
    $innerHTML.=trim($tmp_dom->saveHTML()); 
} 
return $innerHTML; 
}
$url  = 'http://www.laterooms.com/en/hotel-reviews/238902_the-westfield-bb-sandown.aspx';
$html = file_get_contents_curl($url);

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$div_elements = $doc->getElementsByTagName('div');

if ($div_elements->length <> 0){
foreach ($div_elements as $div_element) {
    if ($div_element->getAttribute('class') == 'review newReview'){
        $reviews[] = DOMinnerHTML($div_element);

    }
}
}

print_r($reviews);

Try this, it will return all reviews. You can refine the content as per your requirement.

Screen scrape using php and fopen

1 Answers1