Does Java's reference system help save memory when storing data in multiple containers?

Question

Basic question

Consider this object:

Person alice = new Person("Alice", 0);

It's added to two ArrayLists:

ArrayList<Person> foo = new ArrayList<Person>();
ArrayList<Person> bar = new ArrayList<Person>();

foo.add(alice);
bar.add(alice);

At this point, are there three Person objects in heap memory (alice plus one in each ArrayList)? Or is there one Person object in memory along with three references (pointers) to it?

Motivation for/more involved question

A Person object has two fields, a String and an int. Say I have many Person objects, and I want to have them all sorted in two different ways at different times (sometimes by their Strings alphabetically, sometimes by their ints numerically).

It seems this can be done in two ways:

Have one container, like an ArrayList, of the objects and sort it on demand whenever I want to change the sorting scheme
Have two containers, one which sorts Persons by their Strings alphabetically and one which sorts by their ints numerically

The first way is time inefficient but space efficient.

The second way is time efficient but space inefficient (say Person objects are very large).

In languages like C++, doing this efficiently in time and space would entail the second way but with multiple containers of pointers to a single collection of Person objects. Because this is difficult, people often recommend things like Boost's multi_index_container, which does exactly this. In Java, I've seen people elide this complexity, something that seems possible because all Java objects are behind pointer-like references at all times.

Are they correct in doing so? In Java, is doing things the second way in a space-efficient manner as simple as having multiple containers with redundant references to the same objects?

Bonus question

Is this true or untrue in other languages, like JavaScript, Python, Ruby, C#, Go, Rust, etc.?

Speaking for Java, C# and JavaScript: there are two "types" of types - primitives, and objects. Primitives are stored in the stack and therefore a variable, as a representation of the memory point in the stack, has the value of a primitive. Objects are stored in the heap and all variables only store references to them (note that references are a bit different from pointers but you can view them as pointers under the hood that you can't manipulate). All data is passed by value - both primitives and references (with the exception of C# ref and out keywords). — 5ar, Feb 04 '19 at 00:25
Therefore yes, the array holds references to those objects and not the object itself. Java and JavaScript won't let you store anything other than primitives and references directly to the stack, while C# has structs that are similar to the ones in C/C++, but they are not used that much. Also C# has pointers, which are or at least should never be used. — 5ar, Feb 04 '19 at 00:29
Python handles its memory a bit differently, it treats all of its memory like objects, and all of the variables' values basically act as references. I can't really speak for Ruby, Go and Rust since I've never used them. However I know that Go has pointers and is often compared to C, yet it has a built in garbage collector so I'm not sure are those pointers the same as the ones in C. Ruby on the other hand is often compared with Python so I wouldn't be surprised if they handled memory similarly, however this is just my premature assumption. — 5ar, Feb 04 '19 at 00:43
@5ar It seems [Go's pointers](https://golang.org/doc/faq#methods_on_values_or_pointers) are indeed very much like C's; they're used to reference data you don't want to copy into a function's scope, but garbage collection saves you the responsibility of having to free them. — 6equj5, Feb 04 '19 at 23:00
To try to complete the list in my "Bonus question" here (corrections welcomed), C#, Java, JavaScript, Python, and Ruby all seem to handle object memory similarly: objects added to several different containers would not be cloned to have redundant copies of themselves in memory (unless you went out of your way to clone them, and if that's possible) and the containers would merely be filled with references to the objects. C, C++, Go, and Rust are the languages in which the objects *would* be deeply copied implicitly and where working with pointers (or refs in Rust) would be needed to avoid this. — 6equj5, Feb 04 '19 at 23:06
Correct, any variable or attribute that is not considered a primitive / value-type (depending on the language terminology) does not hold the object itself, but rather a reference to the object. That reference is passed around by value (something to keep in mind when passing objects to functions that have side effects) unless stated otherwise (e.g. `ref` and `out` in C#). If you wan't to create a new object you have to do it explicitly. All of those languages have some kind of mechanism for copying object (however keep in mind that there is a difference between shallow and deep copying) — 5ar, Feb 06 '19 at 14:52
Additionally, when no variable or attribute holds a reference to some object, the garbage collector will schedule that object for deletion (when and how it is deleted depends on the language and garbage collector) — 5ar, Feb 06 '19 at 14:54

Basil Bourque · Accepted Answer · 2019-02-04T00:53:30.993

At this point, are there three Person objects in heap memory (alice plus one in each ArrayList)?

One Person object in memory, with three object references (pointers). The Person alice variable you declared holds one reference, while each List you created holds a single element for an object reference (pointer). So a total of 3 references.

Technically speaking, a ArrayList is a list of object references (pointers), not objects. When we say ArrayList<Person>, we mean a list of pointers that are only allowed to point to objects of class Person but not School or Invoice. So the size of the Person object, School or Invoice object is irrelevant in terms of size in the List. Every List has elements containing only an object reference in each element, so they are always the same size.

The tricky part to understand is that Java syntax was purposely designed to hide the reference/pointer from us as Java programmers. This gives us the convenience in our day-to-day programming work of thinking of the List as containing Person objects. We know the underlying structure as actually references/pointers, but we need not think about that.

When we retrieve a Person from the List:

Person p = foo.get( 0 ) ;  // Annoying zero-based index counting. So zero is the first element.

…Java does the work of accessing the element in the list, obtaining its stored reference/pointer, then following that reference/pointer to look up that object’s place in memory, and return that memory location (basically, a number) to be stored in the reference-to-a-Person-object variable named p here in this example.

When we get the name member from that Person object:

String name = p.name ;

…the JVM follows that reference/pointer stored in p to locate the actual object floating around elsewhere in memory, then moving through that chunk of memory to find the name member variable within the object.

In contrast, the C language makes these pointers quite obvious and available. This makes programming much more confusing and complicated, and much more error-prone.

You need not understand all this when starting out programming. Occassionally revisit the topic, mull it over, and eventually it will click to make sense. Diagrams can help. For more info and drawings, see a couple of my Answers to related Questions, here and here. Search Stack Overflow, as this topic has been covered many times such as here and here.

Sorting

Using the Java Collections Framework, you have two options to access a group of objects in sorted order: Either re-sort when switching order, or maintain two collections. The size of the content within the objects is irrelevant as only pointers are involved in the collections.

Of course third-parties are free to develop their own collections to exhibit other behavior.

Does Java's reference system help save memory when storing data in multiple containers?

Basic question

Motivation for/more involved question

Bonus question

1 Answers1

Sorting