7

I have a string like

String string = "number0 foobar number1 foofoo number2 bar bar bar bar number3 foobar";

I need a regex to give me the following output:

number0 foobar
number1 foofoo
number2 bar bar bar bar
number3 foobar

I have tried

Pattern pattern = Pattern.compile("number\\d+(.*)(number\\d+)?");
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
    System.out.println(matcher.group());
}

but this gives

number0 foobar number1 foofoo number2 bar bar bar bar number3 foobar
Freek de Bruijn
  • 3,552
  • 2
  • 22
  • 28
b3bop
  • 3,373
  • 2
  • 17
  • 17

6 Answers6

10

So you want number (+ an integer) followed by anything until the next number (or end of string), right?

Then you need to tell that to the regex engine:

Pattern pattern = Pattern.compile("number\\d+(?:(?!number).)*");

In your regex, the .* matched as much as it could - everything until the end of the string. Also, you made the second part (number\\d+)? part of the match itself.

Explanation of my solution:

number    # Match "number"
\d+       # Match one of more digits
(?:       # Match...
 (?!      #  (as long as we're not right at the start of the text
  number  #   "number"
 )        #  )
 .        # any character
)*        # Repeat as needed.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    @Tim Pietzcker, thanks for the answer! I always enjoy reading your detailed explanations. – aviad Feb 09 '12 at 07:30
0
Pattern pattern = Pattern.compile("\\w+\\d(\\s\\w+)\1*");
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group());
}
Freek de Bruijn
  • 3,552
  • 2
  • 22
  • 28
vidak
  • 1
  • 2
  • Nice use of the [backreference](https://docs.oracle.com/javase/tutorial/essential/regex/groups.html)! However, this will not work when trying to match "number4 bar foo bar", which might be what the OP is aiming for (in that case "number4 bar" is returned instead of "number4 bar foo bar"). – Freek de Bruijn Dec 23 '15 at 22:57
0

because .* is a greedy pattern. use .*? instead of .*

Pattern pattern = Pattern.compile("number\\d+(.*?)(number\\d+)");
Matcher matcher = pattern.matcher(string);
while(matcher.find();){
    out(matcher.group());
}
shift66
  • 11,760
  • 13
  • 50
  • 83
  • That isn't going to work - this matches only `number0`, `number1`, `number2` and `number3`. The second group is optional (and it shouldn't be part of the match anyway. – Tim Pietzcker Feb 09 '12 at 07:15
  • It still doesn't match the right content. The results for the test string are `number0 foobar number1` and `number2 bar bar bar bar number3`. Didn't you test your code? (It also fails if there is an odd number of `number`s in the string.) – Tim Pietzcker Feb 09 '12 at 07:48
0

If "foobar" is just an example and really you mean "any word" use the following pattern: (number\\d+)\s+(\\w+)

AlexR
  • 114,158
  • 16
  • 130
  • 208
0

Why don't you just match for number\\d+, query the match location, and do the String splitting yourself?

Daniel
  • 27,718
  • 20
  • 89
  • 133
-1

(.*) part of your regex is greedy, therefore it eats everything from that point to the end of the string. Change to non-greedy variant: (.*)?

http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

LeleDumbo
  • 9,192
  • 4
  • 24
  • 38