-1

I have this code:

String result = text;

    String regex = "((\\(|\\[)(.+)(\\)|\\])){1}?";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(result);

    System.out.println("start");
    System.out.println(result);
    while (matcher.find()) {
        System.out.print("Start index: " + matcher.start());
        System.out.print(" End index: " + matcher.end() + " ");
        System.out.println(matcher.group());
    }
    System.out.println("finish");

And I have a string that I want to match:

Some text sentence or sentences [something 234] (some things)

And the output I get when executing:

start
some text sentence or sentences [something 234] (some things)
Start index: 32 End index: 61 [something 234] (some things)
finish

Now I actually want it to find the found cases in brackets separately, so to find: [something 234] in one match (some things) as the second match

Can anyone please help me build the regex accordingly? I am not sure how to put the reluctant quantifier for the whole regular expression, so I surrounded the whole bracketed elements in another brackets. But I don't understand why this reluctant quantifier is acting greedy here and what do I need to do to change that?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Arturas M
  • 4,120
  • 18
  • 50
  • 80

1 Answers1

1

{1} in regex is redundant since any element without specified quantifier needs to be found once. Also making it reluctant doesn't make sense since it doesn't describe range of possible repetitions (like {min,max} where adding ? would tell regex engine to make number of repetitions in that range as close to min as possible). Here {n} describes precise number of repetition so min = max = n.

Now you should be able to solve your problem by making .+ (content between brackets) reluctant. To do so use .+?.

So try with:

String regex = "((\\(|\\[)(.+?)(\\)|\\]))";
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • Hmm, seems to work, but I still don't get how making the quantifier reluctant for the content between brackets makes ir work? Why put it there as I'm searching a match for the brackets and the content, not just the content? I'm totally confused, I would have never come up with the idea to put the reluctant modifier on the content... – Arturas M Jun 05 '16 at 13:11
  • 1
    Try to think how should single result look like. It should be `(...)` and `[...]`. So if you have string like `aaa (foo) [bar] baz` regex should be able to find (mandatory) `(` and `)` and as small range of characters between them. That is why we are making that `.+` which represents characters between either `(` `[` and `]` `)` reluctant (minimal). Maybe this tutorial will explain it better: http://www.regular-expressions.info/repeat.html#lazy – Pshemo Jun 05 '16 at 13:16