-1

This is what I use

    output = System.Text.RegularExpressions.Regex.Replace(output, "(?s)/th>(.*?)</tbody>", "$1")

Notice that I am using (.*?) because I want the search to be ungreedy. That is there are severals /th> around. I want to get rid texts above the LAST /th>

This is what I got.

<!-- statistics_period -->


<input name="subForm" type="hidden" value="1">
<input name="hidTotal" type="hidden" value="886">

<div class="domlistframe">
<div class="divMainListingTable">
<table width="76%" align="left" class="mainListTable" cellspacing="0" cellpadding="3">
    <tbody><tr>
                                                                        <th nowrap="">&nbsp;<               
                                                        <th colspan="4">&nbsp;</th>



        <th id="sercol" nowrap="" colspan="11">Totals</th>

You see? Several /th> there.

Yes I know full well the horrible consequences of parsing html with regular expression as described here RegEx match open tags except XHTML self-contained tags.

I am parsing mostly table anyway. It's working

Note: here is a simpler problem that's equivalent with above Say I have a text like this

cow cow cow chicken cat cow cat dog hello bla.

Say I want cat dog hello. That is text between the last cow and bla.

What would be the regular expression for that?

Notice I want the text between the LAST cow and bla.

Doing it

cow.*bla

will give me the whole text

Doing it cow.?*bla should give me what I want. However, as you can see from the sample I uses, it didn't work.

Community
  • 1
  • 1
user4951
  • 32,206
  • 53
  • 172
  • 282

2 Answers2

2

HINT

Try the pattern:

.*cow((?!cow).*?)bla

for the cow..bla problem.

The leading .* skips everything until the last cow is encountered

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
0

This is only a partial answer. Basically I solved the problem by using the technique hjpotter92 uses.

What I did is

    output = System.Text.RegularExpressions.Regex.Replace(output, "(?s).*/th>(.*?)</tbody>", "$1")

Because the first .* is greedy. It will automatically match the maximum string that contains .*th>

Some question remains. Why my original code doesn't work?

I suspect it has something to do with regular expression works from left to right. Again any input would be fine.

I would also thank htpotter for telling me what complement operator in regex is.

Hmmm... Well, this answer does answer the question of what should I do to make it work and now it's working. However, it's based on other answer. Which one I should pick as answer?

user4951
  • 32,206
  • 53
  • 172
  • 282