1

I know that we prefer ArrayList over HashSet when we need to store duplicates, and HashSet uses hashCode() function to calculate an index for each element in its array.

So, this means if we want to store a single element, then ArrayList should take less time than HashSet. Please correct me if I am wrong anywhere.

But when I am checking performance through code I am getting different behavior.

Case 1:

import java.util.*;
class HashsetVSArraylist
{
 public static void main(String args[])
 {
  ArrayList<Integer> a1=new ArrayList<Integer>();
  long nanos = System.nanoTime();

  a1.add(1);

  System.out.println("ArrayList Time:"+(System.nanoTime()-nanos)+"ns");

  HashSet<Integer> h1=new HashSet<Integer>();
  nanos = System.nanoTime();

  h1.add(2);

  System.out.println("HashSet Insertion Time:"+(System.nanoTime()-nanos)+"ns");
 }
}
Output:
ArrayList Time:495087ns
HashSet Insertion Time:21757ns

Case 2:

import java.util.*;
class HashsetVSArraylist
{
 public static void main(String args[])
 {
  HashSet<Integer> h1=new HashSet<Integer>();
  long nanos = System.nanoTime();

  h1.add(2);

  System.out.println("HashSet Insertion Time:"+(System.nanoTime()-nanos)+"ns");

  ArrayList<Integer> a1=new ArrayList<Integer>();
  nanos = System.nanoTime();

  a1.add(1);

  System.out.println("ArrayList Time:"+(System.nanoTime()-nanos)+"ns");
 }
}
Output:
HashSet Insertion Time:582527ns
ArrayList Time:21758ns

Now, I assume HashSet should take more time for insertion of a single element. But, in both cases behavior is different...less time is taken for the data structure which comes second in the code. Also, behavior changes when the number of elements inserted is more than 1000.

Please explain what is happening.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Bhavya Sharma
  • 309
  • 3
  • 15
  • 1
    Your test will only be valid if you measure putting LOTS of elements. Try adding 100,000 elements to both 5 times and average your results. – Upio Apr 10 '15 at 04:08
  • 1
    You should take a look at this ... [This][1] [1]: http://stackoverflow.com/questions/17985029/hashset-vs-arraylist – Khubab Hamza Apr 10 '15 at 04:19
  • Your initial assumption that `hashCode` causes small overhead does hold true in some cases. A linear search through a small array can perform better than a hash lookup. (And there are also cache considerations at play. A `HashSet` might do more loads.) But, it's a micro-optimization and not the type of thing you want to design your code around. The difference is minimal. – Radiodef Apr 10 '15 at 08:07

2 Answers2

2

Your benchmark is broken. Read: Dynamic compilation and performance measurement and: Anatomy of a flawed microbenchmark before trying to benchmark in Java.

The short explanation is that the total time duration you are trying to measure there is much, much too short, and the benchmark results will be swamped by tiny OS and CPU details, and by the fact that the Java VM is still busy compiling the bytecode to machine code while it begins to run.

Meanwhile, it is a bit mad to compare ArrayList and HashList on performance when they serve two different purposes, but all else being equal, the implementation of ArrayList is significantly simpler, so your assumption is surely correct; it will be faster.

Boann
  • 48,794
  • 16
  • 117
  • 146
1

The true issue here is hidden by the auto-boxing that Java is doing in converting your integer primitive into an Integer object.

When you call a1.add(1) this is actually calling a1.add(Integer.valueOf(1))

The first time you reference the static valueOf method of the Integer class, that is causing the execution of the static initializer in the Integer class, which creates hundreds of static objects and on your system is taking about 500ms.

Even with that, there are a lot of other things going on behind the scenes that interfere with this test such as other static initializers, dynamic memory allocation, system resource allocation, and a myriad of others.

If you can design a test that eliminates or minimizes those variables, then you will find that adds to ArrayList is always faster than HashSet over the long run, but not for any given insertion. Luckily it should never be the case that we care about the speed of a single insertion.

For example imagine the nearly worst case scenario of trying to add a value to an ArrayList, but that ArrayList is at it's maximum allocated size. The ArrayList tries to allocate more room, but the system has hit it's current memory allocation cap so it needs to wait for the VM to allocate more memory with the system. Simultaneously the garbage collector gets kicked off. In this case an insert which may take <1ms normally can take multiple seconds to execute.

Alex Figliolia
  • 471
  • 2
  • 5
  • Integers will exist from very early on in the VM. By the time `main` is called, the Integer class is certainly already initialized. – Boann Apr 10 '15 at 04:32
  • Why would the IntegerCache be initialized before main? No Integer objects are passed in to main, and in the main class there are no Integers referenced? – Alex Figliolia Apr 10 '15 at 04:39
  • 1
    There are a few hundred classes loaded before you even get to main. Use the `-verbose:class` argument to see them. – Boann Apr 10 '15 at 04:52
  • 1
    @Boann we are both correct. Integer is loaded ahead of time, but the IntegerCache child class is only initialized when Integer.valueOf is called. Try this with -verbose:class `public class HelloWorld{ public static void main(String args[]) { System.out.println("Start\n"); Integer.valueOf(1); System.out.println("End\n"); } }` – Alex Figliolia Apr 10 '15 at 04:56
  • It's interesting to see the difference in loading time between Integer and IntegerCache; that's something I neglected to consider. But at least on my VM, both classes are loaded long before it gets to "Start". It seems [`sun.misc.Signal`](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/87d655ae0753/src/share/classes/sun/misc/Signal.java) first calls `Integer.valueOf` (via autoboxing) when it does `signals.put(sig.number, sig);`. – Boann Apr 10 '15 at 05:03
  • The code for IntegerCache is pretty brutal, instantiating 256 objects: private static class IntegerCache { private IntegerCache(){} static final Integer cache[] = new Integer[-(-128) + 127 + 1]; static { for(int i = 0; i < cache.length; i++) cache[i] = new Integer(i - 128); } }` – Alex Figliolia Apr 10 '15 at 05:06