I am trying to remove the tables within an HTML file, specifically, for the following document, I'd like to remove anything within the tags <TABLE....> and </TABLE>
. The document contains multiple tables with texts in between.
The expression that I came up with, <TABLE.*>\s*[\s|\S]*</TABLE>\s*
, however would remove the text in between the tables. In fact it would remove everything between the first <TABLE>
and the last </TABLE>
tags. I would like to keep the texts in between and only remove the tables. Any suggestion is greatly appreciated. Thanks.
====================
<TABLE STYLE=xxx, Font=yyy, etc>
table texts that should be DELETED...
</TABLE>
other texts that should be KEPT...
<TABLE STYLE=xxx, Font=yyy, etc>
table texts that should be DELETED...
</TABLE>
==========================================