I have a hash set of 1000 strings. Each string is having a size of 10.
Can you tell me the exact number of bytes required to store this in memory? Both for 32bit and 64bit VMs.
Can you explain the way to calculate this?
I have a hash set of 1000 strings. Each string is having a size of 10.
Can you tell me the exact number of bytes required to store this in memory? Both for 32bit and 64bit VMs.
Can you explain the way to calculate this?
Because I have no life, I present the results of boredom. Note that this is pretty much guaranteed to be inaccurate, due stupid mistakes and such. Used this for help, but I'm not too sure on accuracy. I could read the JVM specifications, but I don't have that much free time on my hands.
This calculation gets pretty complicated due to the multitude of fields that exist inside the objects of concern, plus some uncertainty on my part about how much overhead there is for objects and where padding goes. If memory serves, objects have 8 bytes reserved for the header. This is all for a 64-bit VM, by the way. Only difference between that and a 32-bit VM is the size of references, I think.
Summary of how to do this: Obtain source code, and recursively add up space needed for all fields. Need knowledge of how VM works and how implementations work.
Starting from a String
. String
defines:
long serialVersionUID
- 8 bytesint hash
- 4 bytes + 4 bytes paddingchar[] value
(set to a char[10]
in your case) - 8 bytes for referenceObjectStreamField[] serialPersistentFields = new ObjectStreamField[0]
- 8 bytes for referencechar[10]
defines:
int length
- 4 byteschar
x10 - 2 bytes * 10 = 20 bytesObjectStreamField[0]
defines:
int length
- 4 bytes + 4 bytes padding Total for a single String
with length 10: 88 bytes
Total for 1000 String
s with length 10: 88000 bytes.
HashSet
defines:
long serialVersionUID
- 8 bytesObject PRESENT
- 8 bytesHashMap<E, Object> map
- 8 bytesHashMap
defines (in Java 8) (ignoring things that are created on demand, like EntrySet
):
long serialVersionUID
- 8 bytesint DEFAULT_INITIAL_CAPACITY
- 4 bytesint MAXIMUM_CAPACITY
- 4 bytesint TREEIFY_THRESHOLD
- 4 bytesint UNTREEIFY_THRESHOLD
- 4 bytesint MIN_TREEIFY_CAPACITY
- 4 bytesint size
- 4 bytesint modcount
- 4 bytesint threshold
- 4 bytesfloat DEFAULT_LOAD_FACTOR
- 4 bytesfloat loadFactor
- 4 bytesNode<K, V>[] table
- 8 bytesNode
defines:
int hash
- 4 bytes + 4 bytes paddingK key
- 8 bytesV value
- 8 bytesNode<K, V> next
- 8 bytesNode<K, V>[]
should have a size of 2048, if I remember how HashMap
works. So it defines:
int length
- 4 bytes + 4 bytes paddingNode<K, V>
reference * 2048 - 8 bytes * 2048 = 16384 bytes.So the HashSet
should be:
HashSet
HashMap
Node<K, V>
inside Node<K, V>[]
* 1000 nodes = 40000 bytesNode<K, V>[]
inside the HashMap
Total: 56496 bytes for the HashSet
, without taking into account the String
contents
So at least by my calculations, the total space taken should be somewhere around 144496 bytes -- about 141 kilobytes (kibibytes for the pedantic). To be honest, this seems like it's more than a bit on the small side, but it's a start.
I can't get the Instrumentation
interface working at the moment, so I can't double-check. But if someone knows what he/she is doing a comment pointing out my mistakes would be welcome.