0

I have following HTML

<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>

I want to get value from <td class="stat stat-last"> => <div class="statnum"> = 22.

I have tried the follow regex but does not any found match.

/<div\sclass="statnum">^(.)\?<\/div>/ig
Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Muhammad Hassaan
  • 7,296
  • 6
  • 30
  • 50
  • 2
    Enable error_reporting. Niether the `/g` flag nor the `^` anchor would work there. And the escaped `\?` is misplaced as well. A typical placeholder is `(.*?)`. -- But if you're that unversed with regexp: the off-topic answer to your question would be to use a DOM traversal frontend (such as `qp($html)->find(".statnum")`, or plain DOMDocument if you'd prefer tedious and brittle). – mario Aug 20 '15 at 12:34
  • 1
    I think you shouldn't use `^` in that place.. Try this `/
    ([^>]+)<\/div>/ig`.
    – starikovs Aug 20 '15 at 12:43
  • Anyway, that's not a good idea to parse HTML with regexps. You always will find a new bug. – starikovs Aug 20 '15 at 12:47
  • If I'm not wrong then, what actually you needed over here is the text content of `div` i.e. `8,13,22` – Narendrasingh Sisodia Aug 20 '15 at 12:50

4 Answers4

3

Here's a way to accomplish this using a parser.

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
    if(strpos($cell->getAttribute('class'), 'stat-last')){
        $divs = $cell->getElementsByTagName('div');
        foreach($divs as $div) { // loop through all divs of the cell
            if($div->getAttribute('class') == 'statnum'){
                echo $div->nodeValue;
            }
        }
    }
}

Output:

22

...or using an xpath...

$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
    echo $statnum->nodeValue;
}

Output:

22

or if you realllly wanted to regex it...

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];

Output:

22

Regex demo: https://regex101.com/r/kM6kI2/1

chris85
  • 23,846
  • 7
  • 34
  • 51
2

I think it would be better if you use an XML parser for that instead of regex. SimpleXML can do the job for you: http://php.net/manual/en/book.simplexml.php

veta
  • 76
  • 5
  • And how do one get the value from the node with the software you suggested? – dakab Aug 20 '15 at 12:39
  • HTML is a specific XML, so it will work with HTML. The SimpleXMLElement class will contain all data related with the node. – veta Aug 20 '15 at 12:42
2
/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si

Your match is in the first capture group. Notice the use of the s option at the end. Makes '.' match new line characters.

jmrah
  • 5,715
  • 3
  • 30
  • 37
1

You can edit your pattern like that:

/<div\sclass="statnum">(.*?)<\/div>/ig
mocak
  • 405
  • 2
  • 5
  • 12