2

I have a ConcurrentHashMap like so:

HashMap<String, Integer> fruitMap = new ConcurrentHashMap<>();

The key is a String of 10 characters, the value is an Integer.

Assuming there is no other memory consuming code in my application, how do I calculate the number of entries that can be stored in the HashMap on a server with 10GiB memory?

It'll be great if you can mention how we can calculate it for both Java 7 and Java 8 or later.

PS: I found this, but I didn't understand how the 6.75KB memory usage for hashmap of 100 ints mapped to ints was arrived at.

seeker
  • 69
  • 1
  • 9
  • 5
    _it depends_ on the JVM version, on how big your heap is going to be, on some enabled/disabled flags. there is no `X` answer. And of course `HashMap` is not assignable to `ConcurrentHashMap` – Eugene Jan 08 '21 at 16:38
  • And you will not get an exact number, but a ballpark figure, is that enough for you, e.g. 10M entries vs. 100m entires vs. 1B entries!? – luk2302 Jan 08 '21 at 16:46
  • 6.75KB is because, its not the primitive type that gets stored but the Integer objects which has some overhead. Refer this - https://stackoverflow.com/questions/8419860/integer-vs-int-with-regard-to-memory . Additionally, hash map internally stores data in buckets...All keys that maps to same hash code will be in same bucket. That too has some overhead. – Rajesh Jose Jan 08 '21 at 16:49
  • @luk2302 why do you say that? of course you can compute the exact size under a specific JVM version with specific flags enabled. – Eugene Jan 08 '21 at 17:27
  • @Eugene The exact size will depend on the exact content (effect on buckets, chains, etc), load factor, concurrency level, initial size, etc, so it is hard to generalize. – Mark Rotteveel Jan 08 '21 at 17:34
  • @MarkRotteveel agreed, my point was that for a very specific case - this is totally doable. – Eugene Jan 08 '21 at 17:35
  • If this answered your question, you can accept it. – Eugene Jan 16 '21 at 17:53

1 Answers1

5

I will only provide you an example against jdk-15 using JOL (that is the only reliable tool I would ever trust for this), for a ConcurrentHashMap with 10 entries, it is up to you from there.

Map<String, Integer> throttleMap = new ConcurrentHashMap<>();

for(int i = 0; i< 10; ++i){
    throttleMap.put((""+i).repeat(10), i);
}

System.out.println( GraphLayout.parseInstance((Object)throttleMap).toFootprint());

This will output:

 COUNT       AVG       SUM   DESCRIPTION
    10        32       320   [B
     1        80        80   [Ljava.util.concurrent.ConcurrentHashMap$Node;
    10        16       160   java.lang.Integer
    10        24       240   java.lang.String
     1        64        64   java.util.concurrent.ConcurrentHashMap
    10        32       320   java.util.concurrent.ConcurrentHashMap$Node
    42                1184   (total)

Understanding the above is not trivial. Integer is the easiest one:

  • 12 bytes for two headers
  • 4 bytes for the inner int field

So 16 bytes for one, you have 10 of those, thus that line:

0        16       160   java.lang.Integer

an instance of String is more involved:

  • 12 bytes for headers
  • 4 bytes for hash field
  • 1 byte for coder field
  • 1 boolean for hashIsZero field (what is hashIsZero?)
  • 2 bytes for padding
  • 4 bytes for value (byte [])

So 24 bytes * 10:

 10        24       240   java.lang.String

That inner byte [] will also add:

  • 12 bytes of headers (byte[] is an Object).
  • 4 bytes for the length field
  • 10 bytes for each of the 10 bytes
  • 6 bytes padding

Thus that:

 10        32       320   [B

Getting the overall picture is left as an exercise to you.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • What are the other lines in the output indicating? `10 32 320 [B` `1 80 80 [Ljava.util.concurrent.ConcurrentHashMap$Node` `1 64 64 java.util.concurrent.ConcurrentHashMap` Does the final output line mean that a total 1184 bytes are used?: `42 1184 (total)` – seeker Jan 08 '21 at 17:52
  • @seeker yes, that means 1184 bytes. – Eugene Jan 08 '21 at 17:55
  • Thanks. I'm still wondering what's the difference between the 1st line in the output and 6th line in the output. Why is `320 bytes` used twice? – seeker Jan 08 '21 at 18:03
  • @seeker different Objects, exactly as the `Description` field says? – Eugene Jan 08 '21 at 18:06
  • Right. I'm trying to understand what is `[B (bytes[])` used for and what is `java.util.concurrent.ConcurrentHashMap$Node` used for? One of them is probably to store the bucket information of the HashMap, but which one is it and what is the other one used for. – seeker Jan 08 '21 at 18:11
  • @seeker did you actually read what I wrote in the answer? I did explain what that `byte[]` is for, at the very end. That `Node` is an internal class in `CHM`, that stores the key value pairs ( which you would find out by simply looking at the source code of it ) – Eugene Jan 08 '21 at 18:16