What regular expression can I use to find in an HTML document tables that only contain one row?

Question

I'm searching through HTML documents and trying to find tables that only contain a single row. What regex can I use to do this? I've tried negative lookahead and can isolate a single row, but I don't see how to ensure that there's only a single <tr></tr> between <table></table> tags.

Here's the regex I'm working with now:

<table[\W].*?<tr[\W].*?<\/tr>.*(?!.*<tr[\W])<\/table>

This should NOT match the regex:

<html>

<body>
  <table>
    <tr>
      <td>a</td>
    </tr>
    <tr>
      <td>b</td>
    </tr>
    <tr>
      <td>c</td>
    </tr>
    <tr>
      <td>d</td>
    </tr>
  </table>
</body>

</html>

This SHOULD match the regex:

<html>

<body>
  <table>
    <tr>
      <td>a</td>
    </tr>
  </table>
</body>

</html>

[Don't try to parse HTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use a specialized HTML-aware parsing tool, like [pup](https://github.com/ericchiang/pup). — Mark Reed, Jul 15 '16 at 18:22

score 0 · Answer 1 · edited May 23 '17 at 10:28

0

This should work: <table>(?>[^<]++|<(?!\/tr>))*<\/tr>(?>[^<]++|<(?!\/tr>))<\/table>

It is looking for only one instance of </tr> between <table> and </table>.

Details about it can be found here: Negative Lookaround Regex - Only one occurrence - Java

edited May 23 '17 at 10:28

Community

1
1

answered Jul 15 '16 at 19:03

padonald

133
1
8

score 0 · Answer 2 · answered Jul 15 '16 at 19:30

0

You could go for an approach with DOMDocument and xpath functions (namely count()). Assuming, you're using PHP (your question is tagged with PCRE):

<?php

$data = <<<DATA
<html>
<head/>
<body>
    <table id="two_rows">
        <tr><td>One column</td></tr>
        <tr><td>Another column</td></tr>
    </table>

    <table id="one_row">
        <tr><td>One column</td></tr>
    </table>
</body>
</html>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$tables = $xpath->query("//table[count(tr) = 1]");
print_r($tables);
?>

See a demo on ideone.com.

answered Jul 15 '16 at 19:30

Jan

42,290
8
54
79

1

Why would you assume PHP? PCRE is a C library ;) – Lucas Trzesniewski Jul 15 '16 at 19:48
@LucasTrzesniewski: Just my *spidey sense* :) Who is trying to analyze `HTML` in `C`nowadays, anyway? – Jan Jul 15 '16 at 20:37

score -1 · Answer 3 · answered Jul 15 '16 at 18:34

-1

Using .match() you can count the <tr>.

Try this: str.match( /<tr.*?<\/tr>/g ).length

answered Jul 15 '16 at 18:34

Jerome Devost

59
5

That's a JavaScript solution, correct? Mine needs to work outside of JavaScript. – hourback Jul 15 '16 at 19:09
Yes, it's a javascript solution. You should have something similar to javascript's `match()` in the language you are using. The logic would be the same. – Jerome Devost Jul 19 '16 at 12:47

What regular expression can I use to find in an HTML document tables that only contain one row?

3 Answers3