2

Normally the . doesn't match newline unless I specify the engine to do so with the (?s) flag. I tried this regexp on my editor's (UltraEdit v14.10) regexp engine using Perl style regexp mode:

(?s).*i

The search text contains multiple lines and each line contains many 'i' characters.

I expect the above regexp means: search as many characters (because with the '?s' the . now matches anything including newline) as possible (because of the greediness for *) until reaching the character 'i'.

This should mean "from the first character to the last 'i' in the last sentence" (greediness should reach the last sentence, right?).

But with UltraEdit's test, it turns out to be "from the first character to the last 'i' in the first sentence that contains an i". Is this result correct? Did I make any wrong interpretation of my reg expression?

e.g. given this text

aaa
bbb
aiaiaiaiaa  
bbbicicid

it is

aaa
bbb
aiaiaiai

matched. But I expect:

aaa
bbb
aiaiaiaiaa  
bbbicici
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
JavaMan
  • 4,954
  • 4
  • 41
  • 69

3 Answers3

5

Your regex is correct, and so are your expectations of its performance.

This is a long-known bug in UltraEdit's regex implementation which I have written repeatedly to support about. As far as I know, it still hasn't been fixed. The problem appears to lie in the fact that UE's regex implementation is essentially line-based, and additional lines are taken into the match only if necessary. So .* will match greedily on the current line, but it will not cross a newline boundary if it doesn't have to in order to achieve a match.

There are some other subtle bugs with line endings. For example, lookbehind doesn't work across newlines, either.

Write to IDM support, or change to an editor with decent regex support. I did both.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I don't know about the lookbehind bug you mentioned. But could it be that greedily searching through the whole input file is too slow that they decided to change the behaviour this way? Remember, UltraEdit does allow editing input file of MBs in size. – JavaMan Dec 03 '10 at 19:03
  • EditPadPro handles files of GBs in size and doesn't have these regex restrictions. If I construct a greedy regex, I expect it to work correctly. If this means running out of memory, then it's my problem or the OS's, but the editor shouldn't second-guess me. – Tim Pietzcker Dec 03 '10 at 19:05
1

Yes you are right this looks like a bug.

Your interpretation is correct. If you are in Perl mode and not Posix. However it should apply to posix as well.

Altough defining the modifiers like you do is very rare.

Mostly you provide a string with delimiters and the modifier afterwards like /.*i/s

But this doesn't matter because your way is correct too. And if it wouldnt be supported, it wouldn't match the first newline either.

So yes, this is definately a bug in your program.

The Surrican
  • 29,118
  • 24
  • 122
  • 168
1

You're right that that regex should match the entire string (all 4 lines). My guess is that UltraEdit is attempting to do some sort of optimization by working line by line, and only accumulating new lines "when necessary".

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • If it were not a bug, I guess that would be the reason they've changed the behaviour to this way. Probably, for a text editor, greedily searching through the whole file produce too bad a performance. – JavaMan Dec 03 '10 at 18:59