35

I am wondering what is the memory overhead of java HashMap compared to ArrayList?

Update:

I would like to improve the speed for searching for specific values of a big pack (6 Millions+) of identical objects.

Thus, I am thinking about using one or several HashMap instead of using ArrayList. But I am wondering what is the overhead of HashMap.

As far as i understand, the key is not stored, only the hash of the key, so it should be something like size of the hash of the object + one pointer.

But what hash function is used? Is it the one offered by Object or another one?

elhoim
  • 6,705
  • 2
  • 23
  • 29

13 Answers13

45

If you're comparing HashMap with ArrayList, I presume you're doing some sort of searching/indexing of the ArrayList, such as binary search or custom hash table...? Because a .get(key) thru 6 million entries would be infeasible using a linear search.

Using that assumption, I've done some empirical tests and come up with the conclusion that "You can store 2.5 times as many small objects in the same amount of RAM if you use ArrayList with binary search or custom hash map implementation, versus HashMap". My test was based on small objects containing only 3 fields, of which one is the key, and the key is an integer. I used a 32bit jdk 1.6. See below for caveats on this figure of "2.5".

The key things to note are:

(a) it's not the space required for references or "load factor" that kills you, but rather the overhead required for object creation. If the key is a primitive type, or a combination of 2 or more primitive or reference values, then each key will require its own object, which carries an overhead of 8 bytes.

(b) In my experience you usually need the key as part of the value, (e.g. to store customer records, indexed by customer id, you still want the customer id as part of the Customer object). This means it is IMO somewhat wasteful that a HashMap separately stores references to keys and values.

Caveats:

  1. The most common type used for HashMap keys is String. The object creation overhead doesn't apply here so the difference would be less.

  2. I got a figure of 2.8, being 8880502 entries inserted into the ArrayList compared with 3148004 into the HashMap on -Xmx256M JVM, but my ArrayList load factor was 80% and my objects were quite small - 12 bytes plus 8 byte object overhead.

  3. My figure, and my implementation, requires that the key is contained within the value, otherwise I'd have the same problem with object creation overhead and it would be just another implementation of HashMap.

My code:

public class Payload {
    int key,b,c;
    Payload(int _key) { key = _key; }
}


import org.junit.Test;

import java.util.HashMap;
import java.util.Map;


public class Overhead {
    @Test
    public void useHashMap()
    {
        int i=0;
        try {
            Map<Integer, Payload> map = new HashMap<Integer, Payload>();
            for (i=0; i < 4000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }

    @Test
    public void useArrayList()
    {
        int i=0;
        try {
            ArrayListMap map = new ArrayListMap();
            for (i=0; i < 9000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }
}


import java.util.ArrayList;


public class ArrayListMap {
    private ArrayList<Payload> map = new ArrayList<Payload>();
    private int[] primes = new int[128];

    static boolean isPrime(int n)
    {
        for (int i=(int)Math.sqrt(n); i >= 2; i--) {
            if (n % i == 0)
                return false;
        }
        return true;
    }

    ArrayListMap()
    {
        for (int i=0; i < 11000000; i++)    // this is clumsy, I admit
            map.add(null);
        int n=31;
        for (int i=0; i < 128; i++) {
            while (! isPrime(n))
                n+=2;
            primes[i] = n;
            n += 2;
        }
        System.out.println("Capacity = " + map.size());
    }

    public void put(int key, Payload value)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            if (map.get(hash) == null) {
                map.set(hash, value);
                return;
            }
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }

    public Payload get(int key)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            Payload payload = map.get(hash);
            if (payload == null)
                return null;
            if (payload.key == key)
                return payload;
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }
}
Tim Cooper
  • 10,023
  • 5
  • 61
  • 77
  • Hi Tim. In the majority of the cases the key set is well defined and finite. I believe you can optimize your code greatly by adding a key cache to evict object creation for the keys. What you think? – Rafael Sanches Dec 06 '12 at 09:20
  • 1
    @Rafael Sanches : Can you explain what you mean by "adding a key cache to evict object creation"? – Tim Cooper Dec 08 '12 at 01:41
15

The simplest thing would be to look at the source and work it out that way. However, you're really comparing apples and oranges - lists and maps are conceptually quite distinct. It's rare that you would choose between them on the basis of memory usage.

What's the background behind this question?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 10
    I'm always amazed at these ArrayList vs HashMap questions. ArrayList vs HashSet I can see making sense, but a Map isn't even a Collection. – Laurence Gonsalves Oct 06 '09 at 16:25
  • 3
    This particular question is somewhat confusing, since it's talking about memory consumption between a Map and a List... However the question may stem from the fact that elhoim is using a very large list and lookups are unsatisfactory (you can use LinkedHashMaps to retain the order, more or less). They may not want their application's footprint to balloon simply because they switch to a map. – Malaxeur Oct 06 '09 at 16:33
  • 1
    Map is, strictly speaking, a collection (it is not a Collection however): http://java.sun.com/javase/6/docs/technotes/guides/collections/overview.html – TofuBeer Oct 06 '09 at 16:46
  • TofuBeer: note the capitalization I used. – Laurence Gonsalves Oct 06 '09 at 17:31
  • 2
    I'm not sure I agree here - I occasionally think "Shall I use a Map instead of a List", if the keys are sparse and there would otherwise be a lot of nulls in the list, or if I need to populate the list in an unpredictable order. – finnw Oct 06 '09 at 18:26
  • @finnw: Well, I did say "rare" rather than "unheard of". – Jon Skeet Oct 06 '09 at 18:43
  • I would love it if the first downvote required a little comment so you could just have a clue what the person was thinking. Only really needed for those mysterious random -1 downvotes that occasionally occur on perfectly good answers. – Bill K Oct 06 '09 at 21:06
9

All that is stored in either is pointers. Depending on your architecture a pointer should be 32 or 64 bits (or more or less)

An array list of 10 tends to allocate 10 "Pointers" at a minimum (and also some one-time overhead stuff).

A map has to allocate twice that (20 pointers) because it stores two values at a time. Then on top of that, it has to store the "Hash". which should be bigger than the map, at a loading of 75% it SHOULD be around 13 32-bit values (hashes).

so if you want an offhand answer, the ratio should be about 1:3.25 or so, but you are only talking pointer storage--very small unless you are storing a massive number of objects--and if so, the utility of being able to reference instantly (HashMap) vs iterate (array) should be MUCH more significant than the memory size.

Oh, also: Arrays can be fit to the exact size of your collection. HashMaps can as well if you specify the size, but if it "Grows" beyond that size, it will re-allocate a larger array and not use some of it, so there can be a little waste there as well.

Bill K
  • 62,186
  • 18
  • 105
  • 157
  • 2
    "A map has to allocate twice that (20 pointers) because it stores two values at a time" assuming the keys and values are distinct. We don't really have any idea what the author is hoping to store since he hasn't given us many details. – matt b Oct 06 '09 at 20:06
  • A map always has to allocate storage for key and value, even if they are the same (hence the word "map"). a HashSet might be what you are thinking of @matt b, it just allocates a single array and you do the mapping inside your object, in which case the ratio would be about 1:2.25 instead. – Bill K Oct 06 '09 at 21:04
7

I don't have an answer for you either, but a quick google search turned up a function in Java that might help.

Runtime.getRuntime().freeMemory();

So I propose that you populate a HashMap and an ArrayList with the same data. Record the free memory, delete the first object, record memory, delete the second object, record the memory, compute the differences,..., profit!!!

You should probably do this with magnitudes of data. ie Start with 1000, then 10000, 100000, 1000000.

EDIT: Corrected, thanks to amischiefr.

EDIT: Sorry for editing your post, but this is pretty important if you are going to use this (and It's a little much for a comment) . freeMemory does not work like you think it would. First, it's value is changed by garbage collection. Secondly, it's value is changed when java allocates more memory. Just using the freeMemory call alone doesn't provide useful data.

Try this:

public static void displayMemory() {
    Runtime r=Runtime.getRuntime();
    r.gc();
    r.gc(); // YES, you NEED 2!
    System.out.println("Memory Used="+(r.totalMemory()-r.freeMemory()));
}

Or you can return the memory used and store it, then compare it to a later value. Either way, remember the 2 gcs and subtracting from totalMemory().

Again, sorry to edit your post!

Bill K
  • 62,186
  • 18
  • 105
  • 157
sanscore
  • 529
  • 1
  • 4
  • 11
  • 2
    The method: "Returns the total amount of memory in the Java virtual machine.", not the amount of memory used by the current application, or remaining memory. For that you need to call freeMemory() – amischiefr Oct 06 '09 at 16:40
  • @Bill: To avoid having the gc changing your metric, you'll need to start the VM with the same initial/maximum size. Invoking gc()x2 has no effect if you have a reference the the datastructure ( that is, the datastructure is not gc'able ) – OscarRyz Oct 06 '09 at 18:11
  • @Oscar, gc() does make a difference, because both ArrayList and HashMap must sometimes reallocate arrays when the collection outgrows the original array. And the old array may not be freed immediately. – finnw Oct 06 '09 at 19:01
  • @BILL: No need to apologize. I have just a basic understanding of the JVM. Thanks for fleshing out the details. If you are still around, would you care to explain why two gc() calls are needed? Is this documented or a JVM quirk? – sanscore Oct 06 '09 at 19:03
  • Uh, there is no guarantee that the VM will actually do a full garbage collection when you call gc(). It is non-deterministic. Calling it twice in a row just doubles the chances. This seems foolish to me. – matt b Oct 06 '09 at 20:03
  • I tested about a million times (okay, tens of times) to get this to work. The first one releases some easy ones but is not a "Full" gc sweep. The second one always invokes a full GC and after that, for a few microseconds at least, the freeMemory call gives reliable results--but of course this is VM Dependant and I haven't tested every vm... – Bill K Oct 06 '09 at 20:58
  • PS. I have no idea why someone voted you down, this is actually a VERY USABLE technique--I've done quite a bit of performance testing. I kind of wish the first 3 downvotes on an answer required a line of text or something... Anyway, +1 to counteract potential ignorance :) – Bill K Oct 06 '09 at 21:00
  • I have a better idea. Inverse the procedure record the memory footprint (total and free), create the object, record memory again. Simpler and you don't need to worry about gc(). I would modify the code ,but I am in class. Lab classrooms, FTW! – sanscore Oct 06 '09 at 22:16
  • My comments were assuming there is more going on than just that test--If you aren't collecting anything, then it might work without the GC except in the case where your allocation causes a GC which will give you bad results again. In general, better safe than sorry. And yes, it is meant to work by diffing the before and after values. – Bill K Oct 06 '09 at 23:59
3

Hashmaps try to maintain a load factor (usually 75% full), you can think of a hashmap as a sparsely filled array list. The problem in a straight up comparison in size is this load factor of the map grows to meet the size of the data. ArrayList on the other hand grows to meet it's need by doubling it's internal array size. For relatively small sizes they are comparable, however as you pack more and more data into the map it requires a lot of empty references in order to maintain the hash performance.

In either case I recommend priming the expected size of the data before you start adding. This will give the implementations a better initial setting and will likely consume less over all in both cases.

Update:

based on your updated problem check out Glazed lists. This is a neat little tool written by some of the Google people for doing operations similar to the one you describe. It's also very quick. Allows clustering, filtering, searching, etc.

reccles
  • 5,174
  • 3
  • 30
  • 31
3

HashMap hold a reference to the value and a reference to the key.

ArrayList just hold a reference to the value.

So, assuming that the key uses the same memory of the value, HashMap uses 50% more memory ( although strictly speaking , is not the HashMap who uses that memory because it just keep a reference to it )

In the other hand HashMap provides constant-time performance for the basic operations (get and put) So, although it may use more memory, getting an element may be much faster using a HashMap than a ArrayList.

So, the next thing you should do is not to care about who uses more memory but what are they good for.

Using the correct data structure for your program saves more CPU/memory than how the library is implemented underneath.

EDIT

After Grant Welch answer I decided to measure for 2,000,000 integers.

Here's the source code

This is the output

$
$javac MemoryUsage.java  
Note: MemoryUsage.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
$java -Xms128m -Xmx128m MemoryUsage 
Using ArrayListMemoryUsage@8558d2 size: 0
Total memory: 133.234.688
Initial free: 132.718.608
  Final free: 77.965.488

Used: 54.753.120
Memory Used 41.364.824
ArrayListMemoryUsage@8558d2 size: 2000000
$
$java -Xms128m -Xmx128m MemoryUsage H
Using HashMapMemoryUsage@8558d2 size: 0
Total memory: 133.234.688
Initial free: 124.329.984
  Final free: 4.109.600

Used: 120.220.384
Memory Used 129.108.608
HashMapMemoryUsage@8558d2 size: 2000000
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • This is not surprising - there are 2000000 elements in the list, but only 65536 entries in the map. Why are you doing the cast to short? Also 2000000 is rather large (I get an OutOfMemoryError with the default heap settings.) And finally you omitted the calls to System.gc() that Grant suggested. Adding these and reducing the size to 20000 I get 410,376 bytes for the list and 912,680 for the map. – finnw Oct 06 '09 at 18:52
  • This is not really a good test. Do you realize that either the list or map could have been GC'ed by the time you get to printing out the size of the heap? Thus erasing any objects you allocated. – matt b Oct 06 '09 at 20:05
  • @finnw: I added the gc' calls and use double for the key. The results are similar as yours. HashMap uses quite more memory than ArrayList. @matt b: They are not gc'ed, because they are instance variables. I have modified the code so it is more clear now ( I've added a println at the end for you to see the objects are still there ) – OscarRyz Oct 06 '09 at 20:37
  • here is how you really/better measures bytes in Java :) http://code.google.com/p/memory-measurer/ – Karussell Jan 11 '12 at 21:29
2

Basically, you should be using the "right tool for the job". Since there are different instances where you'll need a key/value pair (where you may use a HashMap) and different instances where you'll just need a list of values (where you may use a ArrayList) then the question of "which one uses more memory", in my opinion, is moot, since it is not a consideration of choosing one over the other.

But to answer the question, since HashMap stores key/value pairs while ArrayList stores just values, I would assume that the addition of keys alone to the HashMap would mean that it takes up more memory, assuming, of course, we are comparing them by the same value type (e.g. where the values in both are Strings).

Avrom
  • 4,967
  • 2
  • 28
  • 35
2

I think the wrong question is being asked here.

If you would like to improve the speed at which you can search for an object in a List containing six million entries, then you should look into how fast these datatype's retrieval operations perform.

As usual, the Javadocs for these classes state pretty plainly what type of performance they offer:

HashMap:

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

This means that HashMap.get(key) is O(1).

ArrayList:

The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).

This means that most of ArrayList's operations are O(1), but likely not the ones that you would be using to find objects that match a certain value.

If you are iterating over every element in the ArrayList and testing for equality, or using contains(), then this means that your operation is running at O(n) time (or worse).

If you are unfamiliar with O(1) or O(n) notation, this is referring to how long an operation will take. In this case, if you can get constant-time performance, you want to take it. If HashMap.get() is O(1) this means that retrieval operations take roughly the same amount of time regardless of how many entries are in the Map.

The fact that something like ArrayList.contains() is O(n) means that the amount of time it takes grows as the size of the list grows; so iterating thru an ArrayList with six million entries will not be very effective at all.

matt b
  • 138,234
  • 66
  • 282
  • 345
  • The objects retrieval operations are fast since they are only POJO. And yes, I know that HashMap gets are O(1) and that is why i want to use them, but my question is still about **how much more memory will an HashMap use instead of an ArrayList**? – elhoim Oct 06 '09 at 18:31
  • the fact that your objects are POJOs has nothing to do with how fast it is to iterate over a list containing them – matt b Oct 06 '09 at 20:00
1

I don't know the exact number, but HashMaps are much heavier. Comparing the two, ArrayList's internal representation is self evident, but HashMaps retain Entry objects (Entry) which can balloon your memory consumption.

It's not that much larger, but it's larger. A great way to visualize this would be with a dynamic profiler such as YourKit which allows you to see all heap allocations. It's pretty nice.

Malaxeur
  • 5,973
  • 1
  • 36
  • 34
1

This post is giving a lot of information about objects sizes in Java.

Community
  • 1
  • 1
elhoim
  • 6,705
  • 2
  • 23
  • 29
0

As Jon Skeet noted, these are completely different structures. A map (such as HashMap) is a mapping from one value to another - i.e. you have a key that maps to a value, in a Key->Value kind of relationship. The key is hashed, and is placed in an array for quick lookup.

A List, on the other hand, is a collection of elements with order - ArrayList happens to use an array as the back end storage mechanism, but that is irrelevant. Each indexed element is a single element in the list.

edit: based on your comment, I have added the following information:

The key is stored in a hashmap. This is because a hash is not guaranteed to be unique for any two different elements. Thus, the key has to be stored in the case of hashing collisions. If you simply want to see if an element exists in a set of elements, use a Set (the standard implementation of this being HashSet). If the order matters, but you need a quick lookup, use a LinkedHashSet, as it keeps the order the elements were inserted. The lookup time is O(1) on both, but the insertion time is slightly longer on a LinkedHashSet. Use a Map only if you are actually mapping from one value to another - if you simply have a set of unique objects, use a Set, if you have ordered objects, use a List.

aperkins
  • 12,914
  • 4
  • 29
  • 34
0

If you're considering two ArrayLists vs one Hashmap, it's indeterminate; both are partially-full data structures. If you were comparing Vector vs Hashtable, Vector is probably more memory efficient, because it only allocates the space it uses, whereas Hashtables allocate more space.

If you need a key-value pair and aren't doing incredibly memory-hungry work, just use the Hashmap.

Dean J
  • 39,360
  • 16
  • 67
  • 93
0

This site lists the memory consumption for several commonly (and not so commonly) used data structures. From there one can see that the HashMap takes roughly 5 times the space of an ArrayList. The map will also allocate one additional object per entry.

If you need a predictable iteration order and use a LinkedHashMap, the memory consumption will be even higher.

You can do your own memory measurements with Memory Measurer.

There are two important facts to note however:

  1. A lot of data structures (including ArrayList and HashMap) do allocate space more space than they need currently, because otherwise they would have to frequently execute a costly resize operation. Thus the memory consumption per element depends on how many elements are in the collection. For example, an ArrayList with the default settings uses the same memory for 0 to 10 elements.
  2. As others have said, the keys of the map are stored, too. So if they are not in memory anyway, you will have to add this memory cost, too. An additional object will usually take 8 bytes of overhead alone, plus the memory for its fields, and possibly some padding. So this will also be a lot of memory.
Philipp Wendler
  • 11,184
  • 7
  • 52
  • 87