2

I have come across a curious situation where students (I'm a teaching assistant in this context) have to implement their own version of a Singly Linked List (SLL) and compare it empirically with the Java standard library implementation of a Doubly Linked List.

And that's where it gets weird: I've seen multiple students note that the DLL profiles at roughly 0.5% extra space utilization compared to an SLL containing an equal number of elements of the same type. All the while basic analysis of the data structures tells me that a SLL has 2 references per node (1 to the next element, and 1 to the contained value) whereas a DLL has 3 (an additional one reference to the previous element). In other words, that is 50% more space usage per node (disregarding the size of the contained value).

The contained values are mostly Integer value objects so I don't think the size of the contained values matters too much here.

What causes this 2 orders of magnitude difference? I'm not entirely sure that "JVM/collections libraries optimization" can cover the whole difference; otherwise it'd have to be one hell of a JVM/java std lib optimization.

jjpe
  • 664
  • 4
  • 13

3 Answers3

3

The space used should be the same on the Oracle JVM/OpenJDK for a 64-bit JVM with 32-bit references (Compressed oops)

For a Node with two references

header: 12 bytes
two references: 8 bytes
alignment padding: 4 bytes

total is 24 bytes per node as all objects are aligned by 8 byte offsets by default.

For a Node with three references

header: 12 bytes
three references: 12 bytes
alignment padding: 0 bytes

the total is 24 bytes again.

The real question is why did you see any difference at all. This is most likely due to inaccurate memory accounting.

The JVM uses a TLAB (Thread Local Allocation Buffer) This allows the threads in a JVM to grab chunks of memory and allocate concurrently from those chunks. On the downside, you only see how much memory is used from the common Eden space i.e. you have no idea how much of each chunk is used.

A simple way around this is to turn off the TLAB which gives you byte-by-byte memory account (at the cost of some performance)

i.g. try -XX:-UseTLAB on the command line to disable the TLAB and you will see the size of every object allocated.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
2

It's hard to see why there's any difference at all.

First note that a Java object has significant overhead in the form of its header. That reduces your 50% expectation right there.

Next, when you consider that the references are typically 4 bytes wide (given compressed OOPs on a 64-bit HotSpot), but that memory is always allocated in blocks whose size is divisible by 8, you can see that what was left as unused 4 bytes at the end of one structure becomes your third reference in the DLL example.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
2

In addition to what Marko said about the memory overhead of each linked-list node object, your "Integer value objects" stored in these nodes might not be as small as you think. The element type of java's DLL is a generic parameter, and generic parameters in java are always objects, (never primitives,) so even though you may be adding ints to Java's DLL, they get converted to objects (see boxing/unboxing) and stored as objects.

If your students' SLLs store actual primitive ints, then I would actually expect their classes to occupy considerably less space than that of Java's DLL. If your students store Integer objects, then you should consider the fact that the space occupied by these objects further dilutes whatever difference exists between the expected space occupied by the two classes.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Hm I also hadn't considered that Integer objects can actually be quite big. Even then, as far as objects go I think they're fairly small (most of the overhead goes through the Object class and it needs to be paid anyway with generics just like you said). Unless there's something else I'm not taking into account (: – jjpe Feb 24 '15 at 14:19
  • @jipe Assume that an `Integer` instance occupies 16 bytes and that a `LinkedListNode` occupies 24 bytes. So the difference between a design with `int` field inlined into the node and the design with a reference to an `Integer` instance is quite significant: in one case you have 16+24 bytes, in the other just the 24 bytes for the node. – Marko Topolnik Feb 24 '15 at 14:35
  • I fully agree, but inlining is poor (non-reusable) design. The only place I can think of where it would be acceptable to do this is in resource-constrained systems e.g. embedded devices, but experience tells me that code there often isn't that clean anyway. – jjpe Feb 24 '15 at 15:26