For best performance, if strings can be long, and you need to support all Unicode characters, use Set<Integer>
and retainAll()
, where the integer value is a Unicode code point.
In Java 8, that can be done with this code:
private static int countDistinctCommonChars(String s1, String s2) {
Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
set1.retainAll(set2);
return set1.size();
}
If you instead want the common characters returned, you can do this:
private static String getDistinctCommonChars(String s1, String s2) {
Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
set1.retainAll(set2);
int[] codePoints = set1.stream().mapToInt(Integer::intValue).toArray();
Arrays.sort(codePoints);
return new String(codePoints, 0, codePoints.length);
}
Test
public static void main(String[] args) {
test("hello", "lend");
test("lend", "hello");
test("mississippi", "expressionless");
test("expressionless", "comprehensible");
test("", ""); // Extended, i.e. 2 chars per code point
}
private static void test(String s1, String s2) {
System.out.printf("Found %d (\"%s\") common chars between \"%s\" and \"%s\"%n",
countDistinctCommonChars(s1, s2),
getDistinctCommonChars(s1, s2),
s1, s2);
}
Output
Found 2 ("el") common chars between "hello" and "lend"
Found 2 ("el") common chars between "lend" and "hello"
Found 3 ("ips") common chars between "mississippi" and "expressionless"
Found 8 ("eilnoprs") common chars between "expressionless" and "comprehensible"
Found 2 ("") common chars between "" and ""
Note that last test is using Unicode characters from the 'Domino Tiles' Unicode Block (U+1F030 to U+1F09F), i.e. characters that are stored in Java strings as surrogate pairs.