7

I'm trying be able to compare two Strings and identify duplicate words. For example;

String1 = "Hello, my name is John."
String2 = "Can you tell me your name please?"

Comparing String1 and String2 would return the word; "name".

I know it is possible to split these two strings into an array of words, and then iterate over each word of each String in a 2-D array. However this is computationally expensive at O(n^2) and I was wondering if there is a faster way of doing this?

Thanks.

EDIT: Changed the example for clarity.

Faruk Sahin
  • 8,406
  • 5
  • 28
  • 34
Dan_Dan_Man
  • 504
  • 2
  • 6
  • 19

2 Answers2

13

After getting the strings to word arrays:

You can add all the elements in the first array to a hashmap and then scan the second array to see if each of the elements exists in the hashmap. Since access time to a hashmap is O(1), this will be O(n+m) time complexity.

If you do not want to use extra space, you can sort both of the arrays in O(nlogn) and then compare the items in O(n+m) which would give you O(nlogn) in total.

Faruk Sahin
  • 8,406
  • 5
  • 28
  • 34
6

One simple solution is to use the Sets.intersection method of Guava's Sets. It is pretty easy:

String s1 = "Hello, my name is John.";
String s2 = "Can you tell me your name?";
Splitter splitter = Splitter.onPattern("\\W").trimResults().omitEmptyStrings();
Set<String> intersection = Sets.intersection(//
        Sets.newHashSet(splitter.split(s1)), //
        Sets.newHashSet(splitter.split(s2)));
System.out.println(intersection);

Output:

[name]

You can also find more information on algorithms to detect Set intersection on this thread.

Community
  • 1
  • 1
Alex
  • 25,147
  • 6
  • 59
  • 55