2

I was wondering if it was possible to use negative matching on whole words, so that something like [^(<em>.*?<\/em>)] would match everything but text between (and including) <em>...</em>.

I was thinking about using negative lookahead, but I don't think this will work, as I need to check for the opening <em> as well.

Of course, I could just use the positive regex and then subtract the matches from the original text, but I'm looking for a more 'elegant' solution.

thx for any help

DeX3
  • 5,200
  • 6
  • 44
  • 68
  • Related question: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm Mar 31 '11 at 22:17

2 Answers2

4

String#split works as negative match. It returns you an array of whatever part that does not match the regex.

'XXXXXXXX<em>YYYYYYY</em>ZZZZZZZZ'.split(%r|<em>.*?</em>|)
# => ['XXXXXXX', 'ZZZZZZZZ']

And if want it back into a string, just do join.

'XXXXXXXX<em>YYYYYYY</em>ZZZZZZZZ'.split(%r|<em>.*?</em>|).join
 # => 'XXXXXXXZZZZZZZZ'
sawa
  • 165,429
  • 45
  • 277
  • 381
0

The whole thing with lookaround is that it doesn't consume any of the input. If you want to match everything but a pattern, it means that you want to match the prefix and the suffix of that pattern. To match the suffix, you probably want to consume --- and throw away -- the pattern that you don't want. But negative lookahead doesn't consume.

Staffan Nöteberg
  • 4,095
  • 1
  • 19
  • 17