1

I am attempting to capture all column contents within HTML tables. I'm very close, but my regex is only capturing the first column of each table. What do I need to do to capture all of the columns?

Here is my regex and HTML: https://regex101.com/r/jA3sS6/1

Colin
  • 2,428
  • 3
  • 33
  • 48
  • 2
    Any reason for not using PHP `DOMDocument`? – frz3993 Mar 30 '16 at 19:35
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Build a state machine (or use frz3993's method. It's probably a state machine under the hood) – Petro Mar 30 '16 at 19:36
  • Wow, I wish I'd known about https://regex101.com a long time ago. – David White Mar 30 '16 at 20:57

1 Answers1

1

Don't use regular expression, use a Parser instead!

Start with this:

$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );

To retrieve all <td>:

foreach( $dom->GetElementsByTagName( 'td' ) as $td )
{
    echo $td->nodeValue . PHP_EOL;
}

To retrieve all <td class="large-text">:

foreach( $xpath->query( '//td[@class="large-text"]' ) as $td )
{
    echo $td->nodeValue . PHP_EOL;
}

Community
  • 1
  • 1
fusion3k
  • 11,568
  • 4
  • 25
  • 47