Find duplicate value from string array

Question

I found two way to find the duplicate value from string array.

First way :

private static String FindDupValue(String[] sValueTemp) {
    for (int i = 0; i < sValueTemp.length; i++) {
      String sValueToCheck = sValueTemp[i];
      if(sValueToCheck==null || sValueToCheck.equals(""))continue;
      for (int j = 0; j < sValueTemp.length; j++) {
        if(i==j)continue;
        String sValueToCompare = sValueTemp[j];
        if (sValueToCheck.equals(sValueToCompare)){
          return sValueToCompare;
        }
      }

    }
    return "";

  }

Second way :

private static String FindDupValueUsingSet(String[] sValueTemp) {
    Set<String> sValueSet = new HashSet<String>();
    for(String tempValueSet : sValueTemp) {
      if (sValueSet.contains(tempValueSet))
        return tempValueSet;
      else
        if(!tempValueSet.equals(""))
          sValueSet.add(tempValueSet);
    }
    return "";
  }

Both methods are correct.

My question is which one best method and why? Or is there any other best way to find out duplicate value form an array?

Zakhar · Answer 1 · 2015-12-26T07:33:52.897

Nice thing about Set is that it's add operation returns true if this set did not already contain the specified element.

public static void main(String[] args) {
    Set<String> set = new HashSet<>();
    String[] stringsToTest = {"a", "b", "c", "a"};

    for (String s : stringsToTest) {
        boolean notInSetYet = set.add(s);

        if (!notInSetYet) {
            System.out.println("Duplicate: " + s);
        }
    }
}

Output:

Duplicate: a

score 1 · Answer 2 · answered Dec 25 '15 at 12:26

1

Second way.

Using a Set is much more efficient for this operation because sValueSet.contains(tempValueSet) uses the backing map (and hence hash codes and fast lookup times) instead of fully iterating.

answered Dec 25 '15 at 12:26

Jonah Graham

7,890
23
55

score 1 · Answer 3 · answered Dec 25 '15 at 12:27

1

Both of the approaches are almost similar in terms of algorithmic complexity.

The first approach's complexity is O(N * N), where N is the length of the array. I don't think it's necessary to explain why, but just in case it is - the nested loops take N * N units of time, which gives the complexity.

As for the second approach - having a HashSet allows you to search with constant complexity (O(1)), because the search is based on the hashed value of the String. One can think that this approach is more effective, but in reality it isn't that much, because we need to encounter the insert operation on the HashSet.

Adding to the HashSet is of complexity O(N) (worst case scenario). For N String objects, you might have N insert operations, which again gives complexity of O(N * N).

So, to summarize, both of the approaches are of similar expense. I would prefer the second one, though, as it's a bit more readable.

answered Dec 25 '15 at 12:27

Konstantin Yovkov

62,134
8
100
147

HashSet insertion complexity even in the worst case scenario is still amortized `O(1)`, not `O(n)` – LeleDumbo Dec 25 '15 at 12:35
It's `O(1)` if you know the size of the set, otherwise it's `O(n)` – Konstantin Yovkov Dec 25 '15 at 12:36
no, even if you don't know the size, it's **amortized** `O(1)`. when you don't know the size and it reaches the worst case (current number of items >= available size), the set will grow ONCE based on load factor (and initial capacity). there's nothing going `O(n)` there. Java documentation guarantees that. – LeleDumbo Dec 25 '15 at 12:41
@LeleDumbo it's O(1) per item if the hash spreads the keys *used* (as opposed to potential ones). Worst case is if all the hashcodes collide: through java7 that was a linked list and O(N), but java8 switches to a tree and appears to be O(log N). There were denial-of-service attacks based on this where you could send e.g. Tomcat or IIS an HTTP request containing several thousand URL parameters chosen to collide and it would take hours of CPU to process instead of a tenth of a second. – dave_thompson_085 Dec 26 '15 at 22:18

score 1 · Answer 4 · answered Dec 25 '15 at 13:18

in your first approach in the second loop i believe if you change the loop starting point from j=0 to j=i it will make it faster. because you will avoid comparing 2 values twice

private static String FindDupValue(String[] sValueTemp) {
for (int i = 0; i < sValueTemp.length; i++) {
  String sValueToCheck = sValueTemp[i];
  if(sValueToCheck==null || sValueToCheck.equals(""))continue;
  for (int j = i; j < sValueTemp.length; j++) {
    if(i==j)continue;
    String sValueToCompare = sValueTemp[j];
    if (sValueToCheck.equals(sValueToCompare)){
      return sValueToCompare;
    }
  }

}
return "";

}

Kostas Kryptos · Answer 5 · 2015-12-26T11:50:27.720

1

This seems to be one of the fastest approaches, running in O(n), assuming an amortized O(1) for HashSet.add, plus requiring only one hash computation per iteration by omitting the use of contains. It's true that String caches hashcode (thanks to Jonah), this code generalizes the concept of omitting contains.

private static String FindDupValueUsingSet(String[] sValueTemp) {
    Set<String> sValueSet = new HashSet<String>();
    for(String tempValueSet : sValueTemp)
        if (!tempValueSet.equals("")) //exclude empty Strings (add null checking if required)
            if (!sValueSet.add(tempValueSet))
                return tempValueSet;
    return "";
}

edited Dec 26 '15 at 11:50

answered Dec 25 '15 at 13:49

Kostas Kryptos

4,081
2
23
24

"plus requiring only one hash computation per iteration" hashCode of String (an Immutable) is cached, so no recomputatoin, no matter how many times you use it. http://stackoverflow.com/questions/21000611/is-hash-code-of-java-lang-string-really-cached – Jonah Graham Dec 26 '15 at 09:10
1

Yes that's true for Strings – Kostas Kryptos Dec 26 '15 at 11:44

Find duplicate value from string array

5 Answers5