4

In the following let's say zip codes I am trying to exclude the 33333- from the result.
I do:

String zip = "11111 22222 33333- 44444-4444";
String regex = "\\d{5}(?(?=-)-\\d{4})";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(zip);
while (matcher.find()) { 
   System.out.println(" Found: " + matcher.group());     
}

Expect to get:

Found:  11111  
Found:  22222  
Found:  44444-4444

I am trying to enforce format of:
5 digits optionally followed by a - and 4 digits. 5 digits with just a - (hyphen) is not wanted

I get exception:

Exception in thread "main" java.util.regex.PatternSyntaxException: Unknown inline modifier near index 7
\d{5}(?(?=-)(-\d{4}))
       ^
    at java.util.regex.Pattern.error(Unknown Source)
    at java.util.regex.Pattern.group0(Unknown Source)
    at java.util.regex.Pattern.sequence(Unknown Source)
    at java.util.regex.Pattern.expr(Unknown Source)
    at java.util.regex.Pattern.compile(Unknown Source)
    at java.util.regex.Pattern.<init>(Unknown Source)
    at java.util.regex.Pattern.compile(Unknown Source)

Am I not using the conditional lookahead correctly?

tchrist
  • 78,834
  • 30
  • 123
  • 180
Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • 1
    Do you want the last one to match as `44444-4444` or just `44444`? – Kevin K Jan 20 '12 at 17:42
  • I want to get `11111` `22222` `44444-4444` but not the `33333-`.The conditional look ahead I thought would not include `33333-` – Cratylus Jan 20 '12 at 17:57

4 Answers4

6

To capture all numbers except 33333 use this code:

String zip = "11111 22222 33333- 44444-4444";
String regex = "\\d{5}(?=(-\\d{4}|\\s|$))(-\\d{4})?";
Matcher m = Pattern.compile(regex).matcher(zip);
while(m.find())
    System.out.printf("Macthed: [%s]%n", m.group(1));

OUTPUT:

Macthed: [11111]
Macthed: [22222]
Macthed: [44444-4444]

Explanation: This RegEx is using lookahead that itself is like a condition, which means match 5 digit number which must be followed by - and 4 digits OR a space OR end of string and then it is optionally matching a text - and 4 digits.

The reason why your original RegEx is throwing exception because there is a syntax error in ?:(?=-) part of your RegEx.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • My main problem is why I get the exception.I want to extract the numbers adhering a specified format using conditionals. – Cratylus Jan 20 '12 at 18:42
  • Actually you can fix exception in your regex by using `"\\d{5}(?:(?=-)-\\d{4})"` however that will **only capture** `44444-4444` – anubhava Jan 20 '12 at 18:56
  • But does this mean that conditional regular expression based on look ahead does not work in Java?Otherwise what is the correct syntax? – Cratylus Jan 20 '12 at 19:16
  • I don't understand your regex.`(?=(-\\d{4}|\\s|$))` means lookahead for a hyphen followed by 4 digits OR empty space OR end of string?Where is the conditional here?Sorry, is my initial regex all wrong? – Cratylus Jan 20 '12 at 20:23
  • In the same sentence you wrote you didn't understand my regex and then you interpreted it right. It does exactly what your intent is that **5 digits optionally followed by a - and 4 digits. 5 digits with just a - (hyphen) is not wanted**. Regarding your RegEx: yes it will definitely not compile as I mentioned earlier to you . – anubhava Jan 20 '12 at 20:38
  • I mean it is a lookahead and not a conditional.How does it work if it is not a conditional? – Cratylus Jan 20 '12 at 21:13
  • Actually it is using lookahead that itself is like a condition. This RegEx means match 5 digit number which must be followed by `- and 4 digits OR a space OR end of string` and then it is optionally matching a text `- and 4 digits`. – anubhava Jan 20 '12 at 21:28
  • Ok.+1.If you could please update your answer explaining that the exception is caused by wrong syntax and also when a lookahead is also a conditional of what follows (I got it?) or a reference to this, I will gladly mark your answer as accepted.Thanks – Cratylus Jan 20 '12 at 21:44
  • Thanks, explanation added in the answer. – anubhava Jan 20 '12 at 21:51
  • The Op is right and its regex is not wrong. The fact is that the java engine do not support If-Then-Else Conditionals expression. http://www.regular-expressions.info/conditional.html – Cristiano Jun 21 '16 at 21:37
  • another reference here: http://stackoverflow.com/questions/8072756/does-java-support-if-then-else-regexp-constructsperl-constructs – Cristiano Jun 21 '16 at 21:43
0

You'r missing a colon after (?, i.e. use this regex (non-Java-String): \d{5}(?:(?=-)-\d{4}).

However, this might still not produce the result you want. Please post some example input and required output.

Thomas
  • 87,414
  • 12
  • 119
  • 157
  • This produces only 44444-4444.I was expecting `11111` `22222` `44444-4444`.So my regex (I thought) would match the first 5 digits and the conditional lookahead would include also `44444-4444` but not `33333-` since the conditinal look ahead would be false – Cratylus Jan 20 '12 at 17:55
0

Your question is a little unclear to me. I suppose you are looking for:

String st = "11111 22222 33333- 44444-4444";
String pattern = "\\d+(- )";
String res  = st.replaceAll(pattern,"");
System.out.println(res);

Output = 11111 22222 44444-4444

RanRag
  • 48,359
  • 38
  • 114
  • 167
0
(\d{5}(?!-\s)(?:-\d{4})?)

hence:

String regex = "(\\d{5}(?!-\\s)(?:-\\d{4})?)";`
bluish
  • 26,356
  • 27
  • 122
  • 180
Kleenestar
  • 769
  • 4
  • 4