0

I want to extract some data from a table using php preg_match_all(). I have the html as under, I want to get the values in td, say Product code: RC063154016. How can I do that? I don'y have any experience with regex,

  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
MJQ
  • 1,778
  • 6
  • 34
  • 60

4 Answers4

3

Use DomDocument

$str = <<<STR
<table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
STR;

$dom = new DOMDocument();
@$dom->loadHTML($str);
$tds = $dom->getElementsByTagName('td');
foreach($tds as $td){
  echo $td->nodeValue . '<br>';
}

OUTPUT

Product code: RC063154016
Gender: Female
gwillie
  • 1,893
  • 1
  • 12
  • 14
  • Yes that's nice. But there are a lot of td elements in a webpage, and i want the specific ones under that table with ! So what about that?
    – MJQ Jan 31 '14 at 12:16
  • Well how do you identify what you want...id/class attributes/content of certain elements, your choice. An excellent time to read into [DomDocument](http://www.php.net/manual/en/class.domdocument.php) and [DOMXpath](http://www.php.net/manual/en/class.domxpath.php). With those 2 tools you can manipulate HTML with absolute guarantee. Regex is not the best for structured languages. I use regex to parse simple html, but lets see your full table html, then we can determine the best path to use – gwillie Jan 31 '14 at 12:23
  • I can just identify by table attributes like, width="100%" border="0" cellspacing="0"! SO, anything? – MJQ Jan 31 '14 at 12:37
  • Post your html table markup that you're trying to parse. Regex maybe better, maybe DomDoc is better, lets see the code your working with, a little nuance here and there adds up to mountains, if you understand what I mean :) – gwillie Jan 31 '14 at 12:47
  • Figured it out myself by using Query. Thanks for your answer! :) – MJQ Jan 31 '14 at 13:59
0

This should do for you:

preg_match_all('|<td><span>Product code:</span>([^<]*)</td>|', $html, $match);

But if you think there can be random white spaces around tags, then this one:

preg_match_all('|<td>\s*<span>\s*Product code:\s*</span>([^<]*)</td>|', $html, $match);
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0
$data = <<<HTML
  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
HTML;


if(preg_match_all('#<td>\s*<span>Product code:</span>\s*([^<]*)</td>#i', $data, $matches)) {
    print_r($matches);
}
Hett
  • 3,484
  • 2
  • 34
  • 51
0

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

Community
  • 1
  • 1
Mohan
  • 143
  • 10