ArrayList vs HashSet, which is better for memory performance?

Question

In the following test:

public static void main(String[] args) {
        Thread t1 = new Thread(new Runnable() {
            @Override
            public void run() {
                System.out.println("start 1: " + new Date());
                Collection collection = new ArrayList();
                for (int i = 0; i < 1000 * 1000 * 1000; i++) {
                    collection.add(new Object());
                }
                System.out.println("middle 1: " + new Date());
                Iterator iterator = collection.iterator();
                while (iterator.hasNext()) {
                    iterator.next().toString();
                }
                System.out.println("\nend 1: " + new Date());
            }
        });
        Thread t2 = new Thread(new Runnable() {
            @Override
            public void run() {
                System.out.println("start 2: " + new Date());
                Collection collection = new HashSet();
                for (int i = 0; i < 1000 * 1000 * 1000; i++) {
                    collection.add(new Object());
                }
                System.out.println("middle 2: " + new Date());
                Iterator iterator = collection.iterator();
                while (iterator.hasNext()) {
                    iterator.next().toString();
                }
                System.out.println("\nend 2: " + new Date());
            }
        });
        t1.start();
        t2.start();
    }

I got an OutOfMemory for the ArrayList approach.

But, on this high voted answer, it says HashSet consumes more memory. The broken link on the answer looks like to be this one.

Is there something wrong with my test? Or ArrayLists are, really, worst for memory usage than HashSet?

--update: actually, by using collection.add("."); I get the outOfMemory from the arrayList. But for collection.add(new Object()); I get for both.

Read the javadoc of hash set. What is probably happening is that the hashset only contains an entry to one object. — João Rebelo, Jun 08 '16 at 14:30
Your test is flawed. You run the two jobs in parallel and the one who tries to resize-up when the memory is exhausted will blow with OOM. In your example it's the array list, but if you run the test enough times - you;ll see both cases dying with OOM. PS: ArrayList is more memory efficient — Svetlin Zarev, Jun 08 '16 at 14:30
@SvetlinZarev If I run only one of the threads, and using add("."), the ArrayList will throw OutOfMemory. The HashSet not. — The Student, Jun 08 '16 at 14:36
@JoãoRebelo `Object`'s `equals()` and `hashCode()` are implemented in a way that every instance is unique and thus he should actually have a billion elements in the set (which, assuming an object size of at least 4 bytes would amount to 4GB of data already). — Thomas, Jun 08 '16 at 14:39
@Tom Brito If you add the string `"."`, then you are adding 1 000 000 000 times the same INSTANCE, which will result in it being added only ONCE in the hashset - that's the reason you are not seeing OOM. But if you run the example you've posted - where you add a new Object every iteration - you'll see the difference. — Svetlin Zarev, Jun 08 '16 at 14:40
@Tom Brito, have you taken a look at the compiler byte code? It possibly converts your new Object() to a local variable and never again performs a new operation. — João Rebelo, Jun 08 '16 at 14:43
@JoãoRebelo if he's actually using `new Object()` the compiler _should not_ cache the result in a local variable since that would break the semantics of the code (which is you _want_ to create a new instance in each iteration). Note that using a literal like `"."` basically tells the compiler that it is ok to reuse the string object since strings are immutable. On the other hand if you'd use `new String(".")` instead that should result in a new string every time and hence an OOM. — Thomas, Jun 08 '16 at 14:48
@Thomas true enough! Even I doubted my answer as soon as I pressed enter. My bad, I have had not much sleep tonight :) — João Rebelo, Jun 08 '16 at 14:52
As a general advice to handling OutOfMemory situations I'd suggest you acquire and analyze a heap dump (you can tell the JVM to create one on OOM). That way you'd see what takes up the memory and most probably why. In your case you should see that when using `"."` the hashset actually doesn't contain the billion elements you thought it would (i.e. the hashset is _way_ smaller than expected). — Thomas, Jun 08 '16 at 14:52

score 1 · Accepted Answer · answered Jun 08 '16 at 15:13

When you want to measure the size of an object in your Heap, one good approach is to use the excellent tool SizeOf, it will allow you get the the deep size of your Object thanks to the method SizeOf.deepSizeOf.

So here for example an ArrayList with only 10 Millions Object instances (indeed I did not take 1 Billion as in your example) takes 205 Mo, if I set properly the size to 10 Millions (thanks to new ArrayList(size)), it reduces the size to 190 Mo. In case of the HashSet whatever I set the size or not, it takes 521 Mo so it is more than twice bigger.

NB: I got those results with a 64-Bit JVM with UseCompressedOops enabled.

Please note that the size will depend on the target JVM, indeed the result will change if the target JVM is a 32-bit JVM, a 64-bit JVM with UseCompressedOops (-XX:+UseCompressedOops) enabled which is the case by default with heaps less than 32 GB or a 64-bit JVM with UseCompressedOops disabled (-XX:-UseCompressedOops).

When you set the size of a map, the current implementation will change it to the smallest power of two thats >= of the size you set. So if you set size=7, the actual size will be 8. If you set it to 65, the actual will be 128, etc — Svetlin Zarev, Jun 08 '16 at 18:00

Mr. DROP TABLE · Answer 2 · 2016-06-08T14:51:11.183

0

Hashsets consuming more memory would make sense because they need to set up spaces for potentially hashed values (buckets). Now I don't know if it consumes 5.5 times more memory (and that of course depends on the nature of the objects you're storing) but think of it like this. Hashsets are to Hashtables as arraylists are to arrays. You're going to have some empty spaces in hashsets and that's where the extra memory is being consumed, but for that 1 billion iterations you're doing, it's probably going to be worth it

edited Jun 08 '16 at 14:51

answered Jun 08 '16 at 14:37

Mr. DROP TABLE

334
2
9

1

The hashset does not store the `hashed values` – Svetlin Zarev Jun 08 '16 at 14:41
Thanks, my mistake! – Mr. DROP TABLE Jun 08 '16 at 14:52
`Hashsets are to Hashtables as arraylists are to arrays` - I don't agree, at least not with the conclusion you added after that statement. Hashtables are basically like hashmaps with the exception of being threadsafe. Hashsets on the other hand are implemented using hashmaps and thus are very similar to them (the elements are the keys and the values are just dummy objects). In any case they only waste memory if not all buckets are filled (like arraylists with free capacity). Given an equal number of buckets hashtables and hashsets should waste about an equal amount of memory. – Thomas Jun 08 '16 at 14:57
@SvetlinZarev actually, `HashSet` does store the hashes. Have a look at `java.util.HashMap.Node` which is the underlying container because `HashSet` internally uses a `Map`. `Node` has a field `int hash` in addition to `K key`, i.e. it stores the hash as well as the reference to the element. – Thomas Jun 08 '16 at 15:03
Don't hashsets implement hashtables though? Or am I more confused than I thought?!?! – Mr. DROP TABLE Jun 08 '16 at 15:10

ArrayList vs HashSet, which is better for memory performance?

2 Answers2