0

Here is what I need to do:

I have a lengthy string that bears the form

com.example.TEXT A TO BE EXTRACTED at com.example.TEXT B to be extracted at org.xample.SOME OTHER TEXT at...

I would like to get

  • TEXT A TO BE EXTRACTED
  • TEXT B TO BE EXTRACTED
  • ...

but not SOME OTHER TEXT

I am not terribly good at regexes and not at all so in Java. In JavaScript I can get the first match as

var re = /com\.example\.(.*) at/;
s = 'com.example.abcde at';
var m = s.match(re);

which would yield àbcdeinm[1]`

How can I

  • do the equivalent in Java

  • get all matches

The context here is an Android app. I have come across references to the Apache StringUtils class and its stringbetween method. Quite apart from the fact that I cannot locate the relevant JAR file, I would really like to avoid inflating my app with one more JAR just for this need.

I should mention that I am using Java 8 and do not need to target anything less than Android 4.4.2.

halfer
  • 19,824
  • 17
  • 99
  • 186
DroidOS
  • 8,530
  • 16
  • 99
  • 171
  • 1
    You can use this regex in Java: [`com\.example\.(.*?) at`](https://regex101.com/r/DURofv/1) – anubhava Mar 10 '17 at 15:14
  • Lazy dot matching will answer your second question. The first one is answered here: [Java Regex Capturing Groups](http://stackoverflow.com/a/17969620/3832970). – Wiktor Stribiżew Mar 10 '17 at 15:22
  • Your title doesn't seem to match your question -- is this about getting a range of values or matching a pattern? – Hank D Mar 10 '17 at 16:04
  • Hmmm... . I cannot think of a better title but I will have a go at it. All I want to do is given a string similar to `START match this1 END START match this2 END ALTSTART don't match this1 END START match this3 END...`I would like to end up with `[match this1,match this2, match this3]` to do with what I need to next. – DroidOS Mar 10 '17 at 16:10
  • @anubhava thank you. I am only vaguely aware of lazy matching but I was under the impression that it would be greedy and do an overmatch? Could you elaborate on your comment? – DroidOS Mar 10 '17 at 16:13
  • Greedy `.*` will match longest possible string before matching `" at"` while `.*?` will match next immediate `" at"` – anubhava Mar 10 '17 at 16:24

1 Answers1

0

A regex cannot repeat a capturing group and capture all matches to that group in one pass. Regex engines, Java included, will allow arbitrary repetition of capturing groups, but will only capture the last match to that group. By using a loop you can find and capture multiple matching groups from a string.

Here is an example of such a loop taken from Oracle documentation:

Pattern pattern = 
Pattern.compile(console.readLine("%nEnter your regex: "));

Matcher matcher = 
pattern.matcher(console.readLine("Enter input string to search: "));

boolean found = false;
while (matcher.find()) {
     console.format("I found the text" +
                " \"%s\" starting at " +
                "index %d and ending at index %d.%n",
                matcher.group(),
                matcher.start(),
                matcher.end());
     found = true;
}
if(!found){
     console.format("No match found.%n");
}
Will Barnwell
  • 4,049
  • 21
  • 34