15

Need a Java function to find intersection of two strings. i.e. characters common to the strings.

Example:

String s1 = new String("Sychelless");
String s2 = new String("Sydney");
abatishchev
  • 98,240
  • 88
  • 296
  • 433
Deepak
  • 2,094
  • 8
  • 35
  • 48

10 Answers10

25

Using HashSet<Character>:

HashSet<Character> h1 = new HashSet<Character>(), h2 = new HashSet<Character>();
for(int i = 0; i < s1.length(); i++)                                            
{
  h1.add(s1.charAt(i));
}
for(int i = 0; i < s2.length(); i++)
{
  h2.add(s2.charAt(i));
}
h1.retainAll(h2);
Character[] res = h1.toArray(new Character[0]);

This is O(m + n), which is asymptotically optimal.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • what is O(m+n) and how do u arrive at that complexity and what do mean by asymptotically optimal – Deepak Dec 15 '10 at 09:34
  • 13
    `m` and `n` are the lengths of the strings. Adding, removing, and checking a HashSet item are all O(1). So it's basically m (add) + n (add) + m (one check for every element in h1) + m (up to m removals) + m (converting to character array). So that's 4m + n, which is O(m + n) (the constant drops out). O is Big O. Asymptotically optimal means that there is no algorithm with a smaller big O. – Matthew Flaschen Dec 15 '10 at 09:37
  • 2
    The "O(m + n)" stuff is used for analyzing the time and space requirements of algorithms. See: http://en.wikipedia.org/wiki/Big_O_notation – Jesper Dec 15 '10 at 13:46
  • @MonaJalal, it's just required so that toArray returns an array of type Character. This is a consequence of Java's somewhat incomplete generic system (it was added on later, so they couldn't do everything a new system like C# could). See https://docs.oracle.com/javase/7/docs/api/java/util/AbstractCollection.html#toArray%28T[]%29 . – Matthew Flaschen Oct 24 '15 at 23:05
  • 1
    This approach of set fails for following input "aa","aa" . – pravin kottawar May 12 '19 at 05:00
8

Extract the characters

String.toCharArray

Put them in a Set Find the intersection

Set.retainAll
saugata
  • 2,823
  • 1
  • 27
  • 39
7

Most basic approach:

String wordA = "Sychelless";  
String wordB = "Sydney";  
String common = "";  

for(int i=0;i<wordA.length();i++){  
    for(int j=0;j<wordB.length();j++){  
        if(wordA.charAt(i)==wordB.charAt(j)){  
            common += wordA.charAt(i)+" ";  
            break;
        }  
    }  
}  
System.out.println("common is: "+common);  
abatishchev
  • 98,240
  • 88
  • 296
  • 433
jmj
  • 237,923
  • 42
  • 401
  • 438
  • 1
    Output from this is `"common is: S y e e "`. I would not expect a correct solution to include duplicates (at least not if both words don't have the letters multiple times). – Rasmus Faber Dec 15 '10 at 09:39
  • 2
    This is `O(m * n)` because of the nested loop. Separately, you can easily make the String creation more efficient with a `StringBuilder`, since you know the upper bound of the size ahead of time. Rasmus is also right that the intersection should be a set (no duplicates). – Matthew Flaschen Dec 15 '10 at 09:39
  • The duplication can be fixed very easy adding a condition one line before appending to the "common" String like: if(!common.contains(wordA.charAt(i)+"")) – Abel Morelos Mar 12 '12 at 17:59
3

More detail on saugata's response (appeared while I was writing this): -

public static void main(String[] args) {
    String s1 = "Seychelles";
    String s2 = "Sydney";
    Set<Character> ss1 = toSet(s1);
    ss1.retainAll(toSet(s2));
    System.out.println(ss1);
}

public static Set<Character> toSet(String s) {
    Set<Character> ss = new HashSet<Character>(s.length());
    for (char c : s.toCharArray())
        ss.add(Character.valueOf(c));
    return ss;
}
Jim Downing
  • 1,481
  • 12
  • 29
2

I think the algorithm you are looking for is the problem of the longest common subsequence

Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
1

By means of Guava this task seems much easier:

String s1 = new String("Sychelless");
String s2 = new String("Sydney");
Set<String> setA = Sets.newHashSet(Splitter.fixedLength(1).split(s1));
Set<String> setB = Sets.newHashSet(Splitter.fixedLength(1).split(s2));
Sets.intersection(setA, setB);
Sergii Shevchyk
  • 38,716
  • 12
  • 50
  • 61
1

Found same question here, refer this

Implementing an efficent algorithm to find the intersection of two strings

Community
  • 1
  • 1
Vicky
  • 9,515
  • 16
  • 71
  • 88
1

Optimized solution:

public static String twoStrings(String s1, String s2){

    HashSet<Character> stringOne =  new HashSet<Character>(), stringTwo = new HashSet<Character>();  
    int stringOneLength = s1.length();
    int stringTwoLength = s2.length();
    for(int i=0; i<stringOneLength || i<stringTwoLength; i++) {
        if(i < stringOneLength)
            stringOne.add(s1.charAt(i));
        if(i < stringTwoLength)
            stringTwo.add(s2.charAt(i));
    }
    stringOne.retainAll(stringTwo);

    return stringOne.toString();
}
Bishal Jaiswal
  • 1,684
  • 13
  • 15
0

I have used TreeSet. And retainAll() in TreeSet to get matched elements.

Oracle Doc:

retainAll(Collection<?> c)

Retains only the elements in this set that are contained in the specified collection (optional operation).

String s1 = new String("Sychelless");
String s2 = new String("Sydney");

Set<Character> firstSet = new TreeSet<Character>();
for(int i = 0; i < s1.length(); i++) {
    firstSet.add(s1.charAt(i));
}

Set<Character> anotherSet = new TreeSet<Character>();
for(int i = 0; i < s2.length(); i++) {
    anotherSet.add(s2.charAt(i));
}

firstSet.retainAll(anotherSet);
System.out.println("Matched characters are " + firstSet.toString());//print common strings

//output > Matched characters are [S, e, y]
Blasanka
  • 21,001
  • 12
  • 102
  • 104
-5
s1.contains(s2) returns true;
s1.indexOf(s2) returns 0. 
s1.indexOf("foo") returns -1

For more sophisticated cases use class Pattern.

fglez
  • 8,422
  • 4
  • 47
  • 78
AlexR
  • 114,158
  • 16
  • 130
  • 208