0

I try to use PHP to parse a string to extract info, part of the content looks like this

<div>All Versions:</div> 
<div class='rating' role='img' tabindex='-1' aria-label='5 stars, 193984 Ratings'><div>

What's the easiest way in PHP to get these two numbers?

(1) the number of stars - which is 5

(2) ratings - which is 193984

P.S. Please don't consider it as HTML parsing but a string

Joe Huang
  • 6,296
  • 7
  • 48
  • 81

3 Answers3

1

XML Parser enthusiasts would suggest you use a parser to grab the attribute from the div.

$xml = new XMLReader(); //Setup parser
$xml->XML("<div>All Versions:</div><div class='rating' role='img' tabindex='-1' aria-label='5 stars, 193984 Ratings'></div>");
$xml->read();

while($xml->read()) { //Run through each node
    if($xml->getAttribute('class') == 'rating') { //Look for class of 'rating'
        // Break apart aria-label
        list($stars, $ratings) = explode(', ', $xml->getAttribute('aria-label'));
        $stars = intval($stars); //Grab the integer part of the strings
        $ratings = intval($ratings);
        break;
    }
}

$xml->close();

However, this depends on how you would like to identify the div. If there are other identifiers that you would like included (maybe more specific ones like an id) you can include them in the if statement.

Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
1

Once you have isolated this part of the page (whether DOM parsing or not), you can extract the two numbers pretty easily with:

preg_match('#(\d+) stars, (\d+) Ratings#i', $source, $match);
list(, $stars, $ratings) = $match;

Note that it applies to your example. Should other human-readable attributes be present in other cases, or ordered differently, you would need to e.g. split up the string on commas, then search each part individually for stars or ratings.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Did someone with >10k posts suggest using Regex to parse an XML string? Gasp! :O – Bailey Parker Aug 31 '11 at 06:38
  • No XML. No "parsing". Just string *extraction*. (Which despite the dated SO meme is quite workable on HTML anyways.) – mario Aug 31 '11 at 06:43
  • 1
    Funny enough I first was going to suggest Regex but then I feared the angry wrath I would get from everyone for even considering regex. Honestly, if the string is just two lines long, using Regex is much easier. – Bailey Parker Aug 31 '11 at 06:45
  • 1
    Yes, the advantage of us >10k users is to make such suggestions frequently unscolded. I'm always aiming for the laziest thing - which might be QueryPath/DOM first, and the regex only afterwards here. (OP will figure this out..) – mario Aug 31 '11 at 06:48
  • I must say I envy your reputation! And yes, agreed. Especially for designers or people used to jQuery's sizzle selectors, QueryPath makes the most sense. – Bailey Parker Aug 31 '11 at 06:59
1
$string="<div class='rating' role='img' tabindex='-1' aria-label='5 stars, 193984 Ratings'><div>"
$pattern = '/aria-label=\'(\d+) stars, (\d+) Ratings\'/';
preg_match($pattern, $string, $matches); 
echo "<pre>";
print_r($matches); 
TROODON
  • 1,175
  • 1
  • 9
  • 17