This is what I use
output = System.Text.RegularExpressions.Regex.Replace(output, "(?s)/th>(.*?)</tbody>", "$1")
Notice that I am using (.*?) because I want the search to be ungreedy. That is there are severals /th> around. I want to get rid texts above the LAST /th>
This is what I got.
<!-- statistics_period -->
<input name="subForm" type="hidden" value="1">
<input name="hidTotal" type="hidden" value="886">
<div class="domlistframe">
<div class="divMainListingTable">
<table width="76%" align="left" class="mainListTable" cellspacing="0" cellpadding="3">
<tbody><tr>
<th nowrap=""> <
<th colspan="4"> </th>
<th id="sercol" nowrap="" colspan="11">Totals</th>
You see? Several /th> there.
Yes I know full well the horrible consequences of parsing html with regular expression as described here RegEx match open tags except XHTML self-contained tags.
I am parsing mostly table anyway. It's working
Note: here is a simpler problem that's equivalent with above Say I have a text like this
cow cow cow chicken cat cow cat dog hello bla.
Say I want cat dog hello
. That is text between the last cow and bla.
What would be the regular expression for that?
Notice I want the text between the LAST cow and bla.
Doing it
cow.*bla
will give me the whole text
Doing it cow.?*bla
should give me what I want. However, as you can see from the sample I uses, it didn't work.