Find match String using java regular expression with criteria

Question

In Java Consider the list of strings, randomly coming one of its with a different value.

"59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)" OR
"59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee" OR
"59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y" OR
"59USD-300kg-25mb_4G-48p/min(Incl. tax)" OR
"59USD-300kg-25mb_4G" OR
"59USD-300kg" OR
"59USD"

Broadly the hyphen (-) breaks down the part of this string.

I want to get the part of the string passing the keyword or parameter like:

String str = "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)";

Keyword or parameter will be USD and then the result will be

String expectString = "59USD";

String sourceStr = "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee";

Keyword or parameter will be gb and then the result will be

String expectString = "2gb+1gb_Toffee";

String sourceStr = "59USD-300kg-25mb_4G-48p/min(Incl. tax)";

Keyword or parameter will be min and then the result will be

String expectString = "48p/min(Incl. tax)";

Roberto Mozzicato · Answer 1 · 2022-02-03T22:38:39.837

Well a simple solution can be to split the string around the dashes ("-"), then iterate over the split parts and match your keyword. But you have to decide what to do when there are multiple matches or no matches at all. The following code contains 2 basic implementations, one which collects the matches in a list and one which stops after the first match. The first will return an empty list when there are no matches, the second will return null.

import java.util.ArrayList;
import java.util.List;

public class KeywordMatcher {

    private static List<String> getKeywordMatches(String s, String keyword) {
        List<String> ret = new ArrayList<>();
        String[] parts = s.split("-");
        for (String part : parts) {
            if(part.contains(keyword))
                ret.add(part);
        }
        
        return ret;
    }

    private static String getFirstKeywordMatch(String s, String keyword) {
        String[] parts = s.split("-");
        for (String part : parts) {
            if(part.contains(keyword))
                return part;
        }
        
        return null;
    }

    public static void main(String[] args) {
        String s ="59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-65USD-1km(mm/hr)";
        
        System.out.println(getKeywordMatches(s, "USD")); // prints [59USD, 65USD]
        System.out.println(getKeywordMatches(s, "min")); // prints [48p/min(Incl. tax)]

        System.out.println(getFirstKeywordMatch(s, "USD")); // prints 59USD
        System.out.println(getFirstKeywordMatch(s, "min")); // prints 48p/min(Incl. tax)

    }
}

A more sophisticated approach involves searching your string for the next divider ("-") and the next keyword. Iterating the string until its end and keeping track of the relative position of dividers and keywords gets you to the same result in a more memory-efficient way (since you don't create any new object in memory unlike the "split" approach). However the implementation can be quite cumbersome and difficult to read, so I suggest the one described above, unless you have specific performance requirements or the strings to be searched are MB-sized.

EDIT: Having seen the other answer using regular expressions, as I commented below, it is unfortunately extremely inefficient. Look at the following snippet to prove it:

    private static String getFirstKeywordMatch(String s, String keyword) {
        String[] parts = s.split("-");
        for (String part : parts) {
            if(part.contains(keyword))
                return part;
        }
        
        return null;
    }

    private static String lookForKeyword(String message, String keyword) {
        //System.out.println("Looking for keyword \"" + keyword + "\" in string \"" + message + "\"");
        String pattern = "^.*?-?([^-]*" + keyword +"[^-]*)-?.*$";
        Matcher matcher = Pattern.compile(pattern).matcher(message);
        if (matcher.matches()) {
            return matcher.group(1);
        }
        
        return null;
    }

    public static void main(String[] args) {
        String s ="59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-65USD-1km(mm/hr)";
        
        long start=System.currentTimeMillis();
        for(int i=0; i<10000000; i++) {
            lookForKeyword(s, "USD");
        }
        System.out.println("Elapsed ms: " + (System.currentTimeMillis()-start));
        
        start=System.currentTimeMillis();
        for(int i=0; i<10000000; i++) {
            getFirstKeywordMatch(s, "USD");
        }
        System.out.println("Elapsed ms: " + (System.currentTimeMillis()-start));
    }

On my machine the regex approach takes about 4 times than my approach. So while usually I'm a regex ambassador, unfortunately this is not one of the best use cases for them.

Regarding the below answer by nquincampoix, using regular expressions, unfortunately that solution is extremely inefficient from a performance point of view, being about 4 time slower than my solution (which is itself not so optimized). — Roberto Mozzicato, Feb 03 '22 at 22:31
I agree with you about the performance concern. But the OP did not ask about an optimal solution. The question title is "Find match String using java regular expression with criteria". So I just tried to answer it, with regexp. — nquincampoix, Jul 17 '22 at 07:17

Nowhere Man · Answer 2 · 2022-02-03T13:04:06.827

If the structure of the input string is consistent, it may be described with the help of a regular expression with the named groups, and then the names of the groups may be applied to get appropriate "field" from the matched string.

The pattern for a group is as follows: (?<USD>[^-]+): name of the group in angle brackets, [^-]+ -- 1 or more non-dash characters

The first group is followed by N nested optional named groups.

String[] strs = {
    "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)",
    "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee",
    "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y",
    "59USD-300kg-25mb_4G-48p/min(Incl. tax)",
    "59USD-300kg-25mb_4G",
    "59USD-300kg",
    "59USD"
};
Pattern data = Pattern.compile("(?<USD>[^-]+)(-(?<kg>[^-]+)(-(?<mb>[^-]+)(-(?<min>[^-]+)(-(?<y>[^-]+)(-(?<gb>[^-]+)(-(?<km>[^-]+))?)?)?)?)?)?");
for (String str : strs) {
    Matcher m = data.matcher(str);
    if (m.matches()) {
        System.out.println(str);
        System.out.println("\tUSD:\t" + m.group("USD"));
        System.out.println("\tkg :\t" + m.group("kg"));
        System.out.println("\tmb :\t" + m.group("mb"));
        System.out.println("\tmin:\t" + m.group("min"));
        System.out.println("\ty  :\t" + m.group("y"));
        System.out.println("\tgb :\t" + m.group("gb"));
        System.out.println("\tkm :\t" + m.group("km"));
        System.out.println("----");
    }
}

Output:

59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)
    USD:    59USD
    kg :    300kg
    mb :    25mb_4G
    min:    48p/min(Incl. tax)
    y  :    70y
    gb :    2gb+1gb_Toffee
    km :    1km(mm/hr)
----
59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee
    USD:    59USD
    kg :    300kg
    mb :    25mb_4G
    min:    48p/min(Incl. tax)
    y  :    70y
    gb :    2gb+1gb_Toffee
    km :    null
----
59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y
    USD:    59USD
    kg :    300kg
    mb :    25mb_4G
    min:    48p/min(Incl. tax)
    y  :    70y
    gb :    null
    km :    null
----
59USD-300kg-25mb_4G-48p/min(Incl. tax)
    USD:    59USD
    kg :    300kg
    mb :    25mb_4G
    min:    48p/min(Incl. tax)
    y  :    null
    gb :    null
    km :    null
----
59USD-300kg-25mb_4G
    USD:    59USD
    kg :    300kg
    mb :    25mb_4G
    min:    null
    y  :    null
    gb :    null
    km :    null
----
59USD-300kg
    USD:    59USD
    kg :    300kg
    mb :    null
    min:    null
    y  :    null
    gb :    null
    km :    null
----
59USD
    USD:    59USD
    kg :    null
    mb :    null
    min:    null
    y  :    null
    gb :    null
    km :    null
----

score 0 · Accepted Answer · answered Feb 03 '22 at 13:19

If you are comfortable with regexp, you could do this :

    void lookForKeyword(String message, String keyword) {
        System.out.println("Looking for keyword \"" + keyword + "\" in string \"" + message + "\"");
        String pattern = "^.*?-?([^-]*" + keyword +"[^-]*)-?.*$";
        Matcher matcher = Pattern.compile(pattern).matcher(message);
        if (matcher.matches()) {
            System.out.println("Found : \"" + matcher.group(1) + "\"");
        }
    }

    void test() {
        lookForKeyword("59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)", "USD");
        lookForKeyword("59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee", "gb");
        lookForKeyword("59USD-300kg-25mb_4G-48p/min(Incl. tax)", "min");
    }

Output :

Looking for keyword "USD" in string "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee-1km(mm/hr)"
Found : "59USD"
Looking for keyword "gb" in string "59USD-300kg-25mb_4G-48p/min(Incl. tax)-70y-2gb+1gb_Toffee"
Found : "2gb+1gb_Toffee"
Looking for keyword "min" in string "59USD-300kg-25mb_4G-48p/min(Incl. tax)"
Found : "48p/min(Incl. tax)"

Find match String using java regular expression with criteria

3 Answers3