0

I'm searching through HTML documents and trying to find tables that only contain a single row. What regex can I use to do this? I've tried negative lookahead and can isolate a single row, but I don't see how to ensure that there's only a single <tr></tr> between <table></table> tags.

Here's the regex I'm working with now:

<table[\W].*?<tr[\W].*?<\/tr>.*(?!.*<tr[\W])<\/table>

This should NOT match the regex:

<html>

<body>
  <table>
    <tr>
      <td>a</td>
    </tr>
    <tr>
      <td>b</td>
    </tr>
    <tr>
      <td>c</td>
    </tr>
    <tr>
      <td>d</td>
    </tr>
  </table>
</body>

</html>

This SHOULD match the regex:

<html>

<body>
  <table>
    <tr>
      <td>a</td>
    </tr>
  </table>
</body>

</html>
hourback
  • 1,150
  • 4
  • 12
  • 27
  • 7
    [Don't try to parse HTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use a specialized HTML-aware parsing tool, like [pup](https://github.com/ericchiang/pup). – Mark Reed Jul 15 '16 at 18:22

3 Answers3

0

This should work: <table>(?>[^<]++|<(?!\/tr>))*<\/tr>(?>[^<]++|<(?!\/tr>))<\/table>

It is looking for only one instance of </tr> between <table> and </table>.

Details about it can be found here: Negative Lookaround Regex - Only one occurrence - Java

Community
  • 1
  • 1
padonald
  • 133
  • 1
  • 8
0

You could go for an approach with DOMDocument and xpath functions (namely count()). Assuming, you're using PHP (your question is tagged with PCRE):

<?php

$data = <<<DATA
<html>
<head/>
<body>
    <table id="two_rows">
        <tr><td>One column</td></tr>
        <tr><td>Another column</td></tr>
    </table>

    <table id="one_row">
        <tr><td>One column</td></tr>
    </table>
</body>
</html>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$tables = $xpath->query("//table[count(tr) = 1]");
print_r($tables);
?>


See a demo on ideone.com.
Jan
  • 42,290
  • 8
  • 54
  • 79
-1

Using .match() you can count the <tr>.

Try this: str.match( /<tr.*?<\/tr>/g ).length

  • That's a JavaScript solution, correct? Mine needs to work outside of JavaScript. – hourback Jul 15 '16 at 19:09
  • Yes, it's a javascript solution. You should have something similar to javascript's `match()` in the language you are using. The logic would be the same. – Jerome Devost Jul 19 '16 at 12:47