1
String s = #Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,
#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,
#Section251234,Main,First/HS/12345/Jack/M,200011 11.00

Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234),dates (200010,200011) and the values(10.00,11.00,-2.00) associated with it using regex each time. Sometines a single line can contain either one value or two so that what makes the regex sort of confusing. So at the end of day, there will be 3 diff groups we want to extract.

I tried

#Section(\d+)(?:(?!#Section\d).)*\bJack/M,(\d+)\h+(\d+(?:\.\d+)?)\s(\d+)\h+([-+]?\d+(?:\.\d+)?)\b

See it in action here - https://regex101.com/r/JaKeGg/1, it brings in 5 groups instead of 3 and when there is only one value here it doesn't seem to match so I need help with this.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ajm Kir
  • 17
  • 4
  • Have you considered using substitution? Like this: https://regex101.com/r/6Votk8/1 and just concatenating or only using the `$1` you desire? Might do the trick. – sniperd Nov 17 '22 at 15:36
  • If you have this string `200010 10.00 200011` you can not get 200010 and 200011 in a single group – The fourth bird Nov 17 '22 at 15:36
  • @sniperd the thing is #Section251234,Main,First/HS/12345/Jack/M,200011 11.00 this line does not get recognized by the regex even though it should because it dosent have the $4 and $5 subsitutions – Ajm Kir Nov 17 '22 at 15:47
  • is it possible if i can do this using filter @Thefourthbird – Ajm Kir Nov 17 '22 at 15:54
  • You might use a different approach using a `Scanner` and collect it all: `List jackStuff = new Scanner(s).useDelimiter("\\R").tokens().filter(line -> line.contains("/Jack/M")).map(line -> line.replaceAll(".*Jack/M,(.+)", "$1")).map(line -> line.split(",")).collect(Collectors.toList());` – g00se Nov 17 '22 at 16:36
  • You should always accept an answer if you have got the answer. If you do not know how to accept an answer, [here](https://meta.stackexchange.com/a/5235/676168) is the help topic. – Arvind Kumar Avinash Aug 03 '23 at 20:26

2 Answers2

1

You might use a pattern to get 2 capture groups, and then after process the capture 2 values to combine the numbers that should be grouped together.

As the dates and the values in the examples strings seem to go by pair, you can split the group 2 values from the regex on a space and create 2 groups using the modulo operator to group the even/odd occurrences.

#Section(\d+)\b(?:(?!#Section\d).)*\bJack/M,(\d+\h+[-+]?\d+(?:\.\d+)?(?:\s+\d+\h+[-+]?\d+(?:\.\d+)?)*)

Regex demo | Java demo

String regex = "#Section(\\d+)\\b(?:(?!#Section\\d).)*\\bJack/M,(\\d+\\h+[-+]?\\d+(?:\\.\\d+)?(?:\\s+\\d+\\h+[-+]?\\d+(?:\\.\\d+)?)*)";
String string = "#Section250342,Main,First/HS/12345/Jack/M,200010 10.00 200011 -2.00,\n"
        + "#Section250322,Main,First/HS/12345/Aaron/N,200010 17.00,\n"
        + "#Section250399,Main,First/HS/12345/Jimmy/N,200010 12.00,\n"
        + "#Section251234,Main,First/HS/12345/Jack/M,200011 11.00";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);


while (matcher.find()) {
    List<String> group2 = new ArrayList<>();
    List<String> group3 = new ArrayList<>();

    System.out.println("Group 1: " + matcher.group(1));
    String[] parts = matcher.group(2).split("\\s+");
    for (int i = 0; i < parts.length; i++) {
        if (i % 2 == 0) {
            group2.add(parts[i]);
        } else {
            group3.add(parts[i]);
        }
    }
    System.out.println("Group 2: " + Arrays.toString(group2.toArray()));
    System.out.println("Group 3: " + Arrays.toString(group3.toArray()));
}

}

Output

Group 1: 250342
Group 2: [200010, 200011]
Group 3: [10.00, -2.00]
Group 1: 251234
Group 2: [200011]
Group 3: [11.00]

If you want to group all values, you can create 3 lists and print all the 3 lists after the looping.

List<String> group1 = new ArrayList<>();
List<String> group2 = new ArrayList<>();
List<String> group3 = new ArrayList<>();

while (matcher.find()) {
    group1.add(matcher.group(1));
    String[] parts = matcher.group(2).split("\\s+");
    for (int i = 0; i < parts.length; i++) {
        if (i % 2 == 0) {
            group2.add(parts[i]);
        } else {
            group3.add(parts[i]);
        }
    }
}
System.out.println("Group 1: " + Arrays.toString(group1.toArray()));
System.out.println("Group 2: " + Arrays.toString(group2.toArray()));
System.out.println("Group 3: " + Arrays.toString(group3.toArray()));

Output

Group 1: [250342, 251234]
Group 2: [200010, 200011, 200011]
Group 3: [10.00, -2.00, 11.00]

See this Java demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

I think it is quite difficult to accomplish what you want using solely regex. According to another SO question you can't have multiple matches for the same capturing group in your regex. Instead only the last matching pattern will actually be captured.

My suggestion is to split your string by line in java, iterate through the lines, check if a line contains the substring you search for "Jack/M", and then use regex to extract the different bits by searching for simpler regex pattern instead of trying to match one long regex to the whole string.

A good walk through on how to find matches for a regex in a string: https://www.tutorialspoint.com/getting-the-list-of-all-the-matches-java-regular-expressions

damianr13
  • 440
  • 3
  • 9