2

I'm trying to scan a given string for a number. The number cannot be after "v/v./vol/vol.", and cannot be inside parentheses. Here's what I have:

NSString *regex = @"(?i)(?<!v|vol|vol\\.|v\\.)\\d{1,4}(?![\\(]{0}.*\\))";
NSLog(@"Result: %@", [@"test test test 4334 test test" stringByMatching:regex]);
NSLog(@"Result: %@", [@"test test test(4334) test test" stringByMatching:regex]);
NSLog(@"Result: %@", [@"test test test(vol.4334) test test" stringByMatching:regex]);

Infuriatingly, this does not work. My regex can be separated into four parts:

(?i) - make regex case insensitive

(?<!v|vol|vol\\.|v\\.) - negative look-behind assertion for v/v./vol/vol.

\\d{1,4} - the number I'm looking for, 1-4 digits.

(?![\\(]{0}.*\\)) - negative look-ahead assertion: number cannot be preceding a ), unless there's a ( before it.

Maddeningly, if I take out the look-behind assertion, it works. What's the issue here? I'm using RegexKitLite, which uses the ICU regex syntax.

Keng
  • 52,011
  • 32
  • 81
  • 111
Nick Locking
  • 2,147
  • 2
  • 26
  • 42

2 Answers2

3

Your negative lookbehind is positioned incorrectly. Lookbehind's do not modify the input position, your negative lookbehind should come after your \d{1,4} expression:

(?i)\\d{1,4}(?<!v|vol|vol\\.|v\\.)(?![\\(]{0}.*\\))

Alternatively, just use a negative lookahead to accomplish the same purpose:

(?i)(?!v|vol|vol\\.|v\\.)\\d{1,4}(?![\\(]{0}.*\\))
Alex
  • 64,178
  • 48
  • 151
  • 180
  • I’ve noticed that a variable-width look-behind in the Java `Pattern` class can severely impact performance. I haven’t used the ICU libraries though, just drooled over them, so I don’t know whether it’s still true there. – tchrist Nov 22 '10 at 22:04
  • Quite right on the lookbehind issue - I misread the docs. However, it still fails to pick up numbers that aren't inside parentheses. Is my not-inside-parentheses logic flawed? Example: @"String of letters 703 (1234) (more words a number 2 here) (more letters)" – Nick Locking Nov 23 '10 at 07:42
1

Finally ended up with this regex:

(?i)\\d{1,4}(?<!v|vol|vol\\.|v\\.)(?![^\\(]*\\))

The negative look-behind needed to change. Passes all my tests. Thanks to Alex for identifying the positioning of my NLB being wrong.

Nick Locking
  • 2,147
  • 2
  • 26
  • 42