0

I tried to scrap data from web page using regex but it gives DOM warning. So I want to know, is it possible for regex to scrape date, review, rate value from this page?

http://www.yelp.com/biz/franchino-san-francisco?start=80

Here is with DOM:

https://eval.in/143074 give error.

This works for smaller code : https://eval.in/143036

Is it possible using regex?

<?php
$html= file_get_contents('http://www.yelp.com/biz/franchino-san-francisco?start=80');

$html = escapeshellarg($html) ;
$html = nl2br($html);

$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}


$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}

$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');
?>
tripleee
  • 175,061
  • 34
  • 275
  • 318
user2129623
  • 2,167
  • 3
  • 35
  • 64
  • see this : http://stackoverflow.com/questions/13986359/test-php-script-online – aelor Apr 28 '14 at 07:54
  • try running your code on your local machine, what error do you get there ? – aelor Apr 28 '14 at 08:08
  • Using only "regular" regex this is only possible if the site structure is guaranteed to never change and you know it exactly, because `HTML is no regular language` http://blog.codinghorror.com/parsing-html-the-cthulhu-way/ – DrCopyPaste Apr 28 '14 at 08:17
  • @aelor: it gives error for non formed html code similar to `Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 756 in F:\wamp\www\htdocs\thenwat\yelp.php on line 23` – user2129623 Apr 28 '14 at 10:18
  • I can suppress these error using `libxml_use_internal_errors(true)`. Solution above is taken from one of your reply only on different thread – user2129623 Apr 28 '14 at 10:19
  • @DrCopyPaste: thanks, any hint for this? – user2129623 Apr 28 '14 at 10:20
  • @Programming_crazy don't do much php over here, but this looks promising: http://stackoverflow.com/a/3577662/2186023 – DrCopyPaste Apr 28 '14 at 10:27
  • Now that all `eval.in` demo links are dead, this page is missing its [mcve]. Please statically provide sample input in your question body. – mickmackusa Aug 24 '22 at 01:02

0 Answers0