1

Given an arbitrary String that contains 0 or more Substrings that match a regular expression.

How can I count the number of characters in that String that were part of Substrings that matched the regex?

Example:

Given a regex that matches any email address and the String:

"I have two email addresses: email@gmail.com and email@hotmail.com"

This would return the int value of 32 (the number of characters in "email@gmail.com" plus "email@hotmail.com").

I'm not being clear enough, it seems. Let's pretend you want to set a limit to the number of characters in a tweet, but you want to allow people to include their email address in the tweet and have it count as zero characters.

Possible method signature of solution:

public int lengthOfSubStringsMatchingRegex(String input, String regex)
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
Glen Pierce
  • 4,401
  • 5
  • 31
  • 50
  • I know how to get the length of an arbitrary String. – Glen Pierce Apr 06 '17 at 23:46
  • possible duplicate of http://stackoverflow.com/questions/2635082/java-counting-of-occurrences-of-a-word-in-a-string – Sumit Gulati Apr 06 '17 at 23:58
  • Not a duplicate in that I'm not asking for the number of occurrences of a String, I'm looking for the length of all Strings that match a regex in my input String. But that question does have some useful information relative to this in it. – Glen Pierce Apr 07 '17 at 00:01
  • All match objects contain a method to get the length. It's probably length(). –  Apr 07 '17 at 03:35

2 Answers2

4

Just loop over the matching groups of your Regex, and use length() to extract the number of characters. Add them to your counter, and that's it.

public int lengthOfSubStringsMatchingRegex(String input, String regex)
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);

    int count = 0;
    while (m.find())
        count += m.group().length();

    return count;
}

As an alternative, but slightly less readable, you can use directly the offsets:

count += m.end() - m.start();

start() returns the start index of the previous match.
end() returns the offset after the last character matched.

Guillaume F.
  • 5,905
  • 2
  • 31
  • 59
1

Java 9+

Here's a single-statement stream-based solution.

Since Java 9 we can use Matcher.results() that produces a stream of match results Stream<MatchResult> "for each subsequence of the input sequence that matches the pattern".

Then we can transform MatchResult into a captured group and find out its length. To obtain the final value, we just need to add up the elements.

public static int lengthOfSubStringsMatchingRegex(String input, String regex) {
    
    return Pattern.compile(regex).matcher(input) // produces a Matcher
        .results()                // Stream<MatchResult>
        .map(MatchResult::group)  // Stream<String>
        .mapToInt(String::length) // IntStream
        .sum();
}

main()

public static void main(String[] args) {

    System.out.println(lengthOfSubStringsMatchingRegex("a_!_b__c_d_e", "\\p{Punct}+"));
    System.out.println(lengthOfSubStringsMatchingRegex("_?_a_b__c_de_", "\\p{Punct}+"));
}

Output:

7
8
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46