Turning off greed not working in this regex

Question

I am trying to run the following search (with . made to match newlines either by adding the /s flag in perl or replacing it with \_. in vim):

/<output_channels>.*(?=Story).*?<\/output_channels>/

However the ? isn't turning off greed as it normally does - can anyone explain why? For example, it matches the entire contents of the following file rather than just the first element:

<output_channels>
  <output_channel>RSS</output_channel>
  <output_channel>Story</output_channel> 
</output_channels>

<output_channels>
  <output_channel>RSS</output_channel>
</output_channels>

Sorry if I'm missing something obvious.

The RE you give uses a couple of elements that don't work in vim. Not sure if you realize this or not. Check [`:help perl-patterns`](http://vimdoc.sourceforge.net/htmldoc/pattern.html#perl-patterns) for a list of differences. What are you using to do the search? — intuited, Apr 15 '11 at 09:58
@BoltClock Both/either. Ultimately I'll use perl but I find it quicker to text regexes in vim. — tog22, Apr 15 '11 at 11:21

score 1 · Answer 1 · answered Apr 15 '11 at 10:02

1

The first .* in your regex is still greedy. You only added ? after the second one.

answered Apr 15 '11 at 10:02

Avi

19,934
4
57
70

4

But the look-ahead will cover that in this case `(?=Story)` – Seth Apr 15 '11 at 10:05

score 1 · Accepted Answer · edited May 23 '17 at 09:58

1

I put your sample text into a vim buffer, and then executed the command

:%!perl -e '$text = join("", <STDIN>); $text =~ /<output_channels>.*(?=Story).*?<\/output_channels>/s; print $&;'

The result is just the first block of XML. I think this is what you want?

Note that I escaped the / within the regex. Other than this, it is the same one given in your question.

Also note that the equivalent vim RE would be (tested, works):

<output_channels>\_.*\(story\)\@=\_.\{-}<\/output_channels>

See :help perl-patterns for a rundown of the differences between perl and vim REs.

Further note that parsing heirarchical markup with regexps has been known to reawaken ancient demons.

edited May 23 '17 at 09:58

Community

1
1

answered Apr 15 '11 at 10:16

intuited

23,174
7
66
88

Thanks. For what it's worth, your vim RE doesn't work - it'd be nice to know one I could use while testing in vim, but the perl RE is all I really need. – tog22 Apr 15 '11 at 12:41
...though can you explain why the following doesn't work as intended (to capture just the second element in my file) when I switch to a negative lookahead. I have a feeling its to do with the greediness of the first .* but when I switch this to .*? I get an operator. Is there a way to capture elements not containing 'Story' or am I better off using a tool other than regexps? /\_.*\(story\)\@<!\_.\{-}<\/output_channels>/ – tog22 Apr 15 '11 at 15:13
@tog22: I just tested the vim RE and found that it works okay with both [`/`](http://vimdoc.sourceforge.net/htmldoc/pattern.html#/) and [`matchstr()`](http://vimdoc.sourceforge.net/htmldoc/eval.html#matchstr()). Note that in vim you don't need to (and mustn't) surround an RE with `/` characters; I just left those in to make it similar to the perl-ish version. I've taken them out. – intuited Apr 15 '11 at 17:13
1

@tog22: If you want to match a string of text that does not contain a particular submatch, you have to use something like `((?!submatch).)*`. I.E. you specify that each position does not match the thing you're avoiding. – intuited Apr 15 '11 at 17:18
@intuited Really appreciate your help, but I'm afraid you'll have to talk me through it more slowly. In what complete expression would I use `((?!submatch).)*` to find an `` element not containing 'submatch'? I tried `.*?((?!Story).)*?<\/output_channels>` but this matches both blocks... – tog22 Apr 18 '11 at 10:17
@tog22: It would be much easier to use an XML parser to do this sort of thing. I'm not sure what's available in perl, but I find that Python's `lxml` module is very good. It mostly consists of bindings to a C library, so perl may have the same. – intuited Apr 18 '11 at 17:41

Turning off greed not working in this regex

2 Answers2