0

I have a regex with multiple disjunctive capture groups

(a)|(b)|(c)|...

Is there a faster way than this one to access the index of the first successfully matching capture group?

(matcher is an instance of java.util.regex.Matcher)

int getCaptureGroup(Matcher matcher){
    for(int i = 1; i <= matcher.groupCount(); ++i){
        if(matcher.group(i) != null){
            return i;
        }
    }
}
Mmmh mmh
  • 5,334
  • 3
  • 21
  • 29

2 Answers2

1

That depends on what you mean by faster. You can make the code a little more efficient by using start(int) instead of group(int)

if(matcher.start(i) != -1){

If you don't need the actual content of the group, there's no point trying to create a new string object to hold it. I doubt you'll notice any difference in performance, but there's no reason not to do it this way.

But you still have to write the same amount of boilerplate code; there's no way around that. Java's regex flavor is severely lacking in syntactic sugar compared to most other languages.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Thanks for your suggestion. By faster I mean that the result is stored in an other structure than an iterative list and can be accessed < O(n) in terms of complexity. I suppose I have to copy the Matcher and Pattern code and tweak this myself. – Mmmh mmh Oct 07 '13 at 11:18
-1

I guess the pattern is so:

if (matcher.find()) {
  String wholeMatch = matcher.group(0);
  String firstCaptureGroup = matcher.group(1);
  String secondCaptureGroup = matcher.group(2);
  //etc....
}

There could be more than one match. So you could use while cycle for going through all matches.

Please take a look at "Group number" section in javadoc of java.util.regex.Pattern.

Admit
  • 4,897
  • 4
  • 18
  • 26
  • This is not the answer.....he is asking the faster way than his current implementation. – Prabhakaran Ramaswamy Oct 03 '13 at 12:45
  • That could be typo..... instead of return i; that should be return matcher.group(i)..........also i am not the down voter too.... – Prabhakaran Ramaswamy Oct 03 '13 at 12:48
  • Actually I just return (and that's intended) the index of the ***first successfully matching capture group*** – Mmmh mmh Oct 03 '13 at 12:50
  • Then may be it would be easier to try to match by each group, as I see tag lexer. Like `if (string.matches("pattern1")) { //do something} elseif (string.matches("pattern2")) {//do something else}`. For me capturing groups are more useful for extracting data, but not for guessing which pattern matches :) – Admit Oct 03 '13 at 13:02
  • Imagine I have **`(abcdef0)|(abcdef1)`**, wouldn't it be more efficient to have only one compiled pattern? – Mmmh mmh Oct 03 '13 at 13:22
  • It's an interesting question, you could do some benchmarks on that and post them here. [About microbenchmarks in java](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Admit Oct 03 '13 at 13:44