5

I used to thing that HashSet is a pretty fast data structure implementation because it uses hashes (and is implemented via HashMap in its turn). I was solving some problems and decided to check performance issue, so here it is:

You are given an array with numbers - [11, 3, 11, 11, 3, 2, 0, -2, 2] You are supposed to write a function that returns the number that appears "odd" number of times.

Here is my solution:

public class OddNumInArray {

public static List<Integer> oddNumList(int [] ar){
    Collection <Integer> l = new ArrayList<>();
    for (int n: ar) {
        if (l.contains(n)) {
            l.remove(n);
        }
        else {
            l.add(n);
        }
    }
    return (List) l;
}

public static Set<Integer> oddNumHSet(int [] ar){
    Set <Integer> l = new HashSet<>();
    for (int n: ar) {
        if (l.contains(n)) {
            l.remove(n);
        }
        else {
            l.add(n);
        }
    }
    return l;
}

public static void main(String [ ]arg) {
    int [] a1 = new int [10000000];
    for (int i=0; i<10; i++){
        a1[i]=(new Random()).nextInt(5);
    }
    long cur= System.nanoTime();
    System.out.println(oddNumList(a1));
    long c1 = System.nanoTime()-cur;
    System.out.println("TIME CONSUMED:" +c1);
    cur= System.nanoTime();
    System.out.println(oddNumHSet(a1));
    long c2 = System.nanoTime()-cur;
    System.out.println("TIME CONSUMED:" + c2);
    System.out.println("c1/c2*100: "+ (new Double(c1)/new Double(c2)*100));
}

}

And here is an output:

[1, 0]
TIME CONSUMED:101804000
[0, 1]
TIME CONSUMED:183261000
c1/c2*100: 55.55137208680516

So, why is implementation with ArrayList is quicker than one with HashSet by almost 2 times? Thank you.

George Revkov
  • 95
  • 3
  • 8
  • Obviously, It depends on your use case, – Harry Oct 29 '14 at 06:07
  • I don't think this is a duplicate? – Evan Knowles Oct 29 '14 at 06:10
  • Why is it duplicate? I ask for a link then. I searched; this question was not asked before. – George Revkov Oct 29 '14 at 06:11
  • Obviously It depends on your use case, **HashSet** ensures there are no duplicates, gives you an Object(1) contains() method but doesn't preserve order. **ArrayList** doesn't ensure there are no duplicates, contains() is Object(n) but you can control the order of the entries In your case, it checks for the dupplicates in case of odd, So it will take time... But there are cases like just iterations are needed, then u can find hash set being faster. – Harry Oct 29 '14 at 06:12
  • @EvanKnowles This is a duplicate because the OP is attempting to run a microbenchmark with absolutely no controls: No multiple runs, no warmup phase, etc. The level of noise in such a benchmark is high enough to plausibly explain the entire difference. – chrylis -cautiouslyoptimistic- Oct 29 '14 at 06:14
  • 2
    @chrylis - I think this must be a duplicate of `ArrayList vs HashSet` instead of *how to do Microbenchmarking*? – TheLostMind Oct 29 '14 at 06:15
  • 1
    @TheLostMind agreed. – Thihara Oct 29 '14 at 06:18
  • The *actual* answer to this question is "this doesn't demonstrate anything about the relative speed of `ArrayList` and `HashSet`". – chrylis -cautiouslyoptimistic- Oct 29 '14 at 06:22
  • @chrylis Ok, now I am getting more and more confused. I read the link you mentioned. It has a point. But I still can give several tens of result and all of them showing that ArrayList was faster (I am just talking about time consumption, not memory). And this question - even if say it's relatively faster - is not answered there. – George Revkov Oct 29 '14 at 06:26
  • @chrylis - I agree.. The OP is trying to compare Apples and Oranges. – TheLostMind Oct 29 '14 at 06:27

1 Answers1

9

ArrayList doesn't have code to check for duplicates. So, it just adds elements as and how you try to add them. A HashSet on the other hand is meant to have only unique elements, so it makes a check to prevent insertion of duplicate elements.

TheLostMind
  • 35,966
  • 12
  • 68
  • 104
  • So, if I add my own implementation of Hash*Collection_name* via HashMap it will be faster, right? – George Revkov Oct 29 '14 at 06:09
  • @GeorgeRevkov That doesn't make any sense - a HashMap by definition can't have duplicate keys, because that's the way they're looked up. A specific hash-code is made for each item - what does it mean if you have two items with the same hash code? How do you find the second item? – furkle Oct 29 '14 at 06:11
  • @GeorgeRevkov - Actually, a `HashSet` uses a `HashMap` behind the scenes.. So, it will not be faster.. `ArrayList` uses an Array behind the scenes, so it will be faster than both `HashMap` as well as `HashSet`. – TheLostMind Oct 29 '14 at 06:14
  • I am talking about the way how HashSet is implemented. It uses value added to HashSet as a key for HashMap and then adds a default value to map. If I understand correctly, the most time-consuming moment is to look whether is has dups or not, which is implemented in HashSet(am I right here?). If I implement HashSet, but without checking for dups, I will save time, am I? – George Revkov Oct 29 '14 at 06:15
  • 2
    @GeorgeRevkov - It will still be *slower* than an `ArrayList` :P – TheLostMind Oct 29 '14 at 06:18
  • Hm, interesting. I though that ArrayList is one of the slowest data structures. Apparently, it is not. Thanks for the answer – George Revkov Oct 29 '14 at 06:21
  • I actually find this question very useful. I used HashSet when generating permutations using Heap algorithm because I wanted to eliminate duplicates arising as a result of identical characters in the string i was permuting. I changed the HashSet to ArrayList and it became 8 times faster. – lordvidex Dec 03 '20 at 22:16
  • Which one is faster depends mainly on the size of your set. A hashset can do a quick lookup by hash, but it has some overhead. For large sets the overhead is less than the list lookup, for small sets the list is faster. – FrankyHollywood Jan 25 '23 at 11:34