1

This is my sample code:

public String testMethod() {
    String sampleString = "Hi <username>. Is <username> your name?. <username> rocks! <admin> wishes you well. Ask <admin> if you have any trouble!";
    String myRegex = "your regex here";

    Pattern pattern = Pattern.compile(myRegex);
    Matcher matcher = pattern.matcher(stringSample);
    int counter = 0;
    while (matcher.find()) {
        counter++;
    }

    return "Matched substring: " + counter;
}

First, I want to get tags with this pattern <([a-zA-Z0-9_]+)>. When I used the pattern, I get 5 as a result since there are 5 tags in sampleString. This works just fine but I want Matcher to return only unique match.

Based on the string in the sample code, the result would be 2 since there are 2 unique tags (<username> and <admin>). So I build my regex based on this answer and now I have this pattern <([a-zA-Z0-9_]+)>(?!.*\1). I tried the pattern on Regex101 and it works just fine. But when used with the sample code, the result is still 5.

Is there anything wrong with my pattern?

Edit: Just like the linked question, I want to avoid using Maps or Lists. And I want to emphasize that I'm asking why my regex doesn't work on Java when it's supposed to work (based on Regex101 result).

AceVez
  • 291
  • 1
  • 5
  • 19
  • 1
    You're using regexes for things they're not designed for. Regexes for for finding patterns. `Set`s are the right tool for finding all the unique occurrences of something. Don't try to use regexes to solve everything--that's a common beginner mistake. – ajb Jul 24 '17 at 05:02
  • Why do you want to avoid using `Map` or `List`? Is someone penalizing you $100 every time you use one? If not, what motivation do you have for avoiding what could be the right tool for the job? – ajb Jul 24 '17 at 06:28
  • @ajb Ahahahaa. Good question. I'm just trying to learn what regex can do. – AceVez Jul 24 '17 at 07:59
  • Java regexes have a lot of power. They can be used to find all sorts of complicated patterns. They also have the power to make your code unreadable unnecessarily. Use the power wisely. – ajb Jul 24 '17 at 13:33

2 Answers2

2

Rather that coming up with a complex regex, you can use use simple regex <(\\w+)> and store your results in a Set to get unique matches only:

String sampleString = "Hi <username>. Is <username> your name?. <username> rocks! <admin> wishes you well. Ask <admin> if you have any trouble!";
String myRegex = "<(\\w+)>";

Pattern pattern = Pattern.compile(myRegex);
Matcher matcher = pattern.matcher(sampleString);

Set<String> tags = new HashSet<>();

while (matcher.find()) {
    tags.add(matcher.group(1));
}

System.out.printf("tags: %s, count: %d%n", tags, tags.size());

Output:

tags: [admin, username], count: 2
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

You should use <([a-zA-Z0-9_]+)>(?!.*\\1): \\1 for 1st capture group in Java Code not \1 .

Actual \1 is an octal value, see more about this:

What are all the escape characters in Java?

chengpohi
  • 14,064
  • 1
  • 24
  • 42