-1

I'm trying to write a regular express that will capture an HTML table (and all it table data) that has a particular class.

For example, the table has a recapLinks class, its comprised of numerous table rows and table data and then terminated with . See below:

<table width="100%" class="recapLinks" cellspacing="0">

[numerous table rows and data in the table.]

</td></tr></tbody></table>

I'm using javascript.

Ryan J
  • 8,275
  • 3
  • 25
  • 28
Mutuelinvestor
  • 3,384
  • 10
  • 44
  • 75
  • 4
    A regular expression would be extremely tricky. Would not the DOM method `.getElementsByClassName` be sufficient or do you absolutely require a regex answer? – Jonathan Gray Feb 21 '15 at 04:38
  • 2
    You will never write one that is foolproof, that's not what regex is for. – Ruan Mendes Feb 21 '15 at 04:45
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Uyghur Lives Matter Feb 21 '15 at 04:57
  • If I understand what you're asking, I believe what you're attempting is called "Screen Scrapping" or "HTML Scrapping"....I'm no expert at this but I was looking into something like this not too long ago. This may help you or shed some light. http://tinyurl.com/JavaMexScrapping if not, then my apologies. – SorryEh Feb 21 '15 at 05:28

1 Answers1

1

The regex to capture this is pretty simple, if you can guarantee that there are never nested tables. Nested tabled become much trickier to deal with.

/<table[^>]*class=("|')?.*?\bCLASSNAMEHERE\b.*?\1[^>]*>([\s\S]*?)</table>/im

For instance, if an attribute before class had a closing > in it, which isn't likely, but possible, the regex would fall flat on it's face. Complex reges can try to prepare for that, but it's really not worth the effort.

However, jQuery all by itself can make this a breeze, if these elements are within the DOM. Regex can be easily fooled or tripped, deliberately or accidentally but that's why we have parsers. JQuery doesn't care what's nested or not within the element. It doesn't care about quote style, multiline, any of that.

$(document).ready(function () {
  console.log($("table.myClassHere").prop("outerHTML"))
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<table class="myClassHere">
  <tr>
    <td>Book Series</td>
  </tr>
  <tr>
    <td>Pern</td>
  </tr>
  <tr>
    <td>Hobbit</td>
  </tr>
</table>

<table class="otherClassHere">
  <tr>
    <td>Movies</td>
  </tr>
  <tr>
    <td>Avengers</td>
  </tr>
  <tr>
    <td>Matrix</td>
  </tr>
</table>
Regular Jo
  • 5,190
  • 3
  • 25
  • 47