Scrape a number from separate spans

Question

I need to scrape the number 622104 from this html

How can I get the number?

<div class="numbersBackground">
        <div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl00_numberPanel" class="number">
        <div class="numberWrapper"><span>6</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl01_numberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl02_numberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl03_commaPanel" class="comma">

    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl04_numberPanel" class="number">
        <div class="numberWrapper"><span>1</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl05_numberPanel" class="number">
        <div class="numberWrapper"><span>0</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl06_numberPanel" class="number">
        <div class="numberWrapper"><span>4</span></div>
    </div>
</div>

Pascal MARTIN · Accepted Answer · 2011-03-16T19:24:58.227

Using the DOMDocument class to parse the HTML string, thanks to its loadHTML method, you could use an XPath query (using the DOMXpath class) to find all <div> tag with a class="numberWrapper" attribute.

Then, iterate over those, concatenating their content to a variable -- which, at the end of the loop, will contain your number.

For example, you could have this kind of code :

$str = <<<HTML
... HERE YOUR HTML ...
HTML;

$number = '';

$dom = new DOMDocument();
if ($dom->loadHTML($str)) {
    $xpath = new DOMXpath($dom);
    $results = $xpath->query('//div[@class="numberWrapper"]');
    foreach ($results as $div) {
        $number .= $div->nodeValue;
    }
}

var_dump($number);

And, as output, you'd get :

string '622104' (length=6)

You could also use the following XPath query, to make sure you're only working with the <span> tags :

$results = $xpath->query('//div[@class="numberWrapper"]/span');

Here, as the <div>s only contain the <span>, the result will be the same -- but it might change, in other situations.

Of course (just to make sure it's said) : Regular Expressions are not the right way to extract informations from an HTML string.

Edit after the comment :

If there are other <div>s you don't want to take into account, you'll have to find another XPath query -- that matches what you want to extract.

For example, maybe something like this would do the trick :

$results = $xpath->query('//div[@class="numbersBackground"]//div[@class="numberWrapper"]/span');

Of course, up to you to find out exactly what matches your the structure of your HTML document.

If you want to download the HTML, you have two solutions :

If allow_url_fopen is enabled on your server, you can use DOMDocument::loadHTMLFile(), passing it the URL as a parameter.
Else, you'll have to download the HTML content, using, for instance, curl.

As a sidenote, if you get warnings before your HTML is not valid, you'll want to take a look at the libxml_use_internal_errors() function ;-)

+1: "The" correct solution if the input can be trusted to be well-formed. Beat me to it. — Jon, Mar 16 '11 at 19:14
@Jon `DOMDocument::loadHTML` accepts code that's not XML-valid : it works with broken HTML -- if not *too* broken. — Pascal MARTIN, Mar 16 '11 at 19:15
what if there are more divs with a class of number wrapper? and what would I use to direct the script to the webpage rather than entering a string http://www.bungie.net/stats/reach/online.aspx — AndrewFerrara, Mar 16 '11 at 19:19
@Andrew I've edited my answer with some additional informations :-) — Pascal MARTIN, Mar 16 '11 at 19:25
Simpler might be to just extract that snippet from the DOM, then `striptags()` to leave just the numbers. Of course, that assumes the digits in question are the only text nodes in the snippet. — Marc B, Mar 16 '11 at 19:34

Scrape a number from separate spans

1 Answers1