Get data from table using regex php

Question

I want to extract some data from a table using php preg_match_all(). I have the html as under, I want to get the values in td, say Product code: RC063154016. How can I do that? I don'y have any experience with regex,

  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>

[DomDocument](http://www.php.net/manual/en/class.domdocument.php) might be better. Take a look at [this](http://stackoverflow.com/a/4423796/1057527). — machineaddict, Jan 31 '14 at 12:02

score 3 · Accepted Answer · answered Jan 31 '14 at 12:02

3

Use DomDocument

$str = <<<STR
<table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>                   
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
STR;

$dom = new DOMDocument();
@$dom->loadHTML($str);
$tds = $dom->getElementsByTagName('td');
foreach($tds as $td){
  echo $td->nodeValue . '<br>';
}

OUTPUT

Product code: RC063154016
Gender: Female

answered Jan 31 '14 at 12:02

gwillie

1,893
1
12
14

Yes that's nice. But there are a lot of td elements in a webpage, and i want the specific ones under that table with ! So what about that?
– MJQ Jan 31 '14 at 12:16
Well how do you identify what you want...id/class attributes/content of certain elements, your choice. An excellent time to read into [DomDocument](http://www.php.net/manual/en/class.domdocument.php) and [DOMXpath](http://www.php.net/manual/en/class.domxpath.php). With those 2 tools you can manipulate HTML with absolute guarantee. Regex is not the best for structured languages. I use regex to parse simple html, but lets see your full table html, then we can determine the best path to use – gwillie Jan 31 '14 at 12:23
I can just identify by table attributes like, width="100%" border="0" cellspacing="0"! SO, anything? – MJQ Jan 31 '14 at 12:37
Post your html table markup that you're trying to parse. Regex maybe better, maybe DomDoc is better, lets see the code your working with, a little nuance here and there adds up to mountains, if you understand what I mean :) – gwillie Jan 31 '14 at 12:47
Figured it out myself by using Query. Thanks for your answer! :) – MJQ Jan 31 '14 at 13:59

score 0 · Answer 2 · answered Jan 31 '14 at 12:04

This should do for you:

preg_match_all('|<td><span>Product code:</span>([^<]*)</td>|', $html, $match);

But if you think there can be random white spaces around tags, then this one:

preg_match_all('|<td>\s*<span>\s*Product code:\s*</span>([^<]*)</td>|', $html, $match);

score 0 · Answer 3 · answered Jan 31 '14 at 12:04

$data = <<<HTML
  <table width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td><span>Product code:</span> RC063154016</td>
          <td><span>Gender:</span> Female</td>
        </tr>
      </tbody>
    </table>
HTML;


if(preg_match_all('#<td>\s*<span>Product code:</span>\s*([^<]*)</td>#i', $data, $matches)) {
    print_r($matches);
}

score 0 · Answer 4 · edited May 23 '17 at 12:11

0

Use any one parser and parse the HTML and use it. Don't use preg* functions here. Please read this answer How do you parse and process HTML/XML in PHP?

edited May 23 '17 at 12:11

Community

1
1

answered Jan 31 '14 at 12:05

Mohan

143
10

Get data from table using regex php

4 Answers4

OUTPUT