0

At the moment I have an Excel sheet with a column holding data in this form:

E 1-6,44-80
E 10-76
E 44-80,233-425
E 19-55,62-83,86-119,200-390
...

I need to be able to capture each range of numbers individually. For example, I would like the first line above to result in "1-6" and "44-80" being captured into their own groups. So, essentially I need to capture a repeating group.

When trying to use this pattern, which uses the general form for capturing repeating groups given by @ssent1 on this question:

E\s(([0-9]{1,4})-([0-9]{1,4}))((?:,([0-9]{1,4})-([0-9]{1,4}))*)

I end up only matching the first and last number ranges. I understand that this is because I'm repeating captured groups rather than capturing a repeating group, but I can't figure out how to correct my pattern. Any help would be greatly appreciated.

mesunmoon
  • 35
  • 1
  • 5
  • There is a large number of different tools and programming languages with regular expression support, and they have differences in behavior. Multiple-valued capture support definitely falls into the "differences" category. What are you using? – Mark Reed Jul 18 '22 at 15:08
  • Does this answer your question? [How to capture multiple repeated groups?](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups) – Mark Reed Jul 18 '22 at 15:11
  • @MarkReed I'm using Java. I figured I may have to use Java's regex library to solve this rather than being able to do this with just a pattern. Does that seem to be right? – mesunmoon Jul 18 '22 at 15:11
  • @MarkReed I'm actually using the pattern given by ssent1 in that post for my regex pattern. The problem is that it fails to capture the repeating group and instead continues to repeat the captured group. – mesunmoon Jul 18 '22 at 15:12

1 Answers1

1

In Java you can make use of a capture group and the \G anchor to get continuous matches:

(?:^E\h+|\G(?!^),?(\d{1,4}-\d{1,4}))

Regex demo | Java demo

Example

String regex = "(?:^E\\h+|\\G(?!^),?(\\d{1,4}-\\d{1,4}))";
String string = "E 1-6,44-80\n"
 + "E 10-76\n"
 + "E 44-80,233-425\n"
 + "E 19-55,62-83,86-119,200-390\n"
 + "200-390";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    if (matcher.group(1) != null) {
        System.out.println(matcher.group(1));
    }
}

Output

1-6
44-80
10-76
44-80
233-425
19-55
62-83
86-119
200-390
The fourth bird
  • 154,723
  • 16
  • 55
  • 70