Use regex to get 2 specific groups of substring

Question

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00

Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.

I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.

.Section(\d+(?:\.\d+)?).*/Jack/M

Yellow · Answer 1 · 2022-08-02T23:48:28.673

If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.

Example:

#Section250342,Main,First/HS/12345/Jack/M,2000 10.00

becomes,

#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})

If the section substring keeps the format but the other parts of it may change then just replace the rest like this:

#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})

I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.

\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group

For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html

Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher

import java.util.regex.*;

public class GetMatch {

    public static void main(String[] args) {

        String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
        
        Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
        Matcher m;
        String[] tokens = s.split(",(?=#)"); //split the sections into different strings
        
        for(String t : tokens) //checks every string that we got with the split
        {   
            m = p.matcher(t);
            if(m.matches()) //if the string matches the pattern then print the capturing groups
                System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
        }
    }
}

The fourth bird · Answer 2 · 2022-08-02T22:01:14.957

You could use 2 capture groups, and use a tempered greedy token approach to not cross @Section followed by a digit.

#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b

Explanation

#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary

Regex demo

In Java:

String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Use regex to get 2 specific groups of substring

2 Answers2