1

is there a way to group a matching element but not have that match appear in the resulting match groups? for example, suppose I have a string with two lines:

<td>text 1</td>
<td><a href=whatever>this is</a> text 2</td>

and I want to parse out "text 1" and "this is text 2". what I'm doing now is using this pattern:

<td>(<a href=.+?>)?(.+?(</a>)?.+?)</td>

basically grouping the anchor tags so I can have the pattern match them zero or one time. I don't want those groups to appear in the match results (though I can easily ignore them). is there a proper way to do this?

Richard Simões
  • 12,401
  • 6
  • 41
  • 50
toasteroven
  • 2,700
  • 3
  • 26
  • 35
  • 1
    Regular expressions are insufficient for what you are trying to do: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Richard Simões Nov 20 '09 at 23:34

1 Answers1

4

You can use a non-capturing group:

(?:xxx)

A non-capturing group works like a normal group in that you can use operators on it. But it does not capture anything, and you can't use it for backreferences.

Andomar
  • 232,371
  • 49
  • 380
  • 404
  • thanks, that's what I need. but it looks like it doesn't do what I want if I nest a non-capturing group inside a capturing group...is that not possible? – toasteroven Nov 20 '09 at 23:28
  • specifically for the second example, if I match with: (?:)(.+?(?:).+?) it doesn't properly match the – toasteroven Nov 20 '09 at 23:30
  • In the regex in your comment, `a href` is not optional. Try `(?:)?(.+?(?:)?.+?)` instead. BTW-- if you're parsing HTML, a regex is a pretty bad approach. Try this instead http://www.codeplex.com/htmlagilitypack – Andomar Nov 20 '09 at 23:45