Why is there no heap overhead for the base reference?

Question

From Stroustrup's Foundations of C++, he offers a pure object-oriented language (on Page 4).

class complex { double re, im; /* … */ };
complex a[ ] = { {1,2}, {3,4} };

He assumes a in the pure object-oriented language is allocated on the heap, and a's memory layout looks like:

The likely size is 3*sizeof(reference)+3*sizeof(heap_overhead)+4*sizeof(double). Assuming a reference to be one word and the heap overhead to be two words, we get a likely size of 19 words to compare to C++’s 8 words. This memory overhead comes with a run-time overhead from allocation and indirect access to elements. That indirect access to memory typically causes problems with cache utilization and limits ROMability.

I noticed that the uppermost reference has no heap overhead (white rectangle).

I guess it is a general phenomenon rather than specified for the pure OO example language.

But I cannot find any reference (I do admit this is not a search-engine-friendly question).

Update

Thanks for your answers. However, I forgot to post my original guess. (Apologies, it is my fault.)

Actually, I also thought because a itself may be allocated on the stack or somewhere else, it will not have heap overhead itself. But later, I noted BS also says:

Compare this with a more typical layout from a “pure object-oriented language” where each user-defined object is allocated separately on the heap and accessed through a reference...

on Page 4.

So, I think that he has limited the implementation to heap-only (or stack-less).

(Of course, maybe I'm reading too much into this sentence.)

a program will always use a heap **and** a stack. both should exist — Joseph D., May 24 '18 at 06:20
@codekaizer In history there exists stack-less and heap-less design https://stackoverflow.com/a/50479025/6949852 — Chen Li, May 24 '18 at 06:23
@陳力 he talks about languages like java where all variables are pointers\references without exception. Stack is implemented first on all stckless machines. Mind you, C was created when no computer had CPu-based stack. call stack was emulated. Without stack of either type you can't call functions. FOrmal C++ documents talk about storage type and duration, not about how that storage was implemented. — Swift - Friday Pie, May 24 '18 at 06:23

Dietmar Kühl · Answer 1 · 2018-05-24T06:01:36.077

The top-level reference lives wherever it lives. Here are a few options:

It can live on the stack. a in the code could be represented like that.
It is embedded into an existing object.
Memory is allocated to just hold the handle. In this case it would have a memory overhead.
It lives as a global entity.
A reference can live in a register. a in the code could also be represented like that in which case it kind of has neither a heap overhead nor a memory overhead: instead it "only" requires the use of a register.

The main realisation in this context is that these references are not actual objects, i.e., you can't have a reference to a reference. As a result there is no need to keep the position of the reference fixed. They can be (and they are when they are actually represented) actual values. The value of a reference in a typical implementation is the address of the object.

The other entities are objects:

Class complex is used to create an object which actually embeds to odd double "objects" which are also values rather than actual objects.
a is an array object in this model referencing two complex objects.

Thanks : ) Please take a look at my update in the question. – Chen Li May 24 '18 at 06:15 — Chen Li, May 24 '18 at 06:15

score 2 · Accepted Answer · answered May 24 '18 at 05:50

2

You get heap overhead when you store something on the heap. The two complex values are stored on the heap, so they get overhead. The array of references is also stored on the heap, so it gets overhead.

However, the reference to the array is not stored on the heap. Usually, this reference will be either placed on the stack as a local variable, or that stack storage might be optimized away by using a CPU register instead. In either case, the reference itself is just a local pointer variable which does not have heap allocation overhead itself.

answered May 24 '18 at 05:50

cmaster - reinstate monica

38,891
9
62
106

Thanks, cmaster. Actually, I also had this answer for myself before posting this question. But later, I saw BS also say: `each user-defined object is allocated separately on the heap` in Page 4. `a` it self is also a `user-defined object`, so I think he also limits the implementation to heap-only(or stack-less) – Chen Li May 24 '18 at 05:57
@陳力 You cannot compile a program to a stack-less executable. Simply because you *need* some references into the heap to be able to work with it. You cannot retrieve any information without knowing where it is to be found. In a compiled program, the CPU has some registers which allow it to find its stack and code. And that stack contains references (local variables) that allow it to find the objects on the heap. That is how the information is anchored to the CPU, and without this anchoring, there is no way to magically know where stuff is. – cmaster - reinstate monica May 24 '18 at 06:10
Thanks! you convince me – Chen Li May 24 '18 at 06:31
1

@cmaster even now exist CPU architectures (most of RISC, MIPS, 51-derived controllers, etc) that do not work that way. but there is always some structure of code that allows emulation of that behavior, for sake of using function-based structure of code. Stack registers might be not present, but then there is "stack" data structure that stores addresses\values\return address, etc. – Swift - Friday Pie May 24 '18 at 06:34
@Swift-FridayPie You mean like that there is no `push`/`pop` instructions? Yeah, that happens. On PPC, you would create a stack frame with an invocation like `stwu r1,-StackFrameSize(r1)` instead. (If I remember correctly, `r1` was the register that was used to hold the stack pointer *by convention*. It could have been any other register, as far as the CPU was concerned.) But the fact remains that there is some place where you can push return addresses, somehow. – cmaster - reinstate monica May 24 '18 at 06:50
Also, there are some (older or very constrained) microcontrollers where there's no such a thing as a proper stack - you are expected to use pretty much only globals. But in these cases you really aren't going to have a heap as well. – Matteo Italia May 24 '18 at 06:56
1

@MatteoItalia Ok, I should have said: You cannot compile a **recursive** program to a stack-less executable. As long as your functions are not reentrant, you can get away with storing their local variables in some globally accessible location, and you can hard-code their addresses into the CPU instructions. In that case, you can really get away without a stack. Nevertheless, even in this setting, you have the local variables (=references) which themselves are stored somewhere without allocation overhead. At some point, you always have this anchoring, even without a stack. – cmaster - reinstate monica May 24 '18 at 07:03

Matteo Italia · Answer 3 · 2018-05-24T07:01:48.820

The idea here is that the top level reference is either a local/parameter on the stack/in a register (where there's no heap overhead) or as a part of some bigger structure (that in turn may be allocated on the heap, but we don't count its overhead here, as it's not specific to this particular reference, but it's shared between the other data members).

Actually, I also thought because a itself may be allocated on stack or any else, so itself will not have heap overhead. But later, I noted BS also says:

Compare this with a more typical layout from a “pure object-oriented language” where each user-defined object is allocated separately on the heap and accessed through a reference...

in Page 4.

So, I think that he has limited the implementation to heap-only(or stack-less).

(Of course, maybe I read too much into this sentence)

I think that you are reading too much into it. The real-world languages Stroustroup is referring to (such as Python, Java or C# - if we limit ourselves to boxed types) do have a "regular" stack containing a plain array of references.

Thanks : ) But please take a look at my updates in question. — Chen Li, May 24 '18 at 06:17

Joseph D. · Answer 4 · 2018-05-24T06:31:59.463

complex a[ ] = { {1,2}, {3,4} };

Let's break this - bottom-up approach.

You have two complex objects that are list initialized: Complex{1,2} and Complex{3,4}.

Complex{ double, double }

Each complex object has the address of the heap (white) and the space occupied (blue) by the double values.

Next is the outer { brace which is to list-initialize an array of Complex objects.

array { Complex1, Complex2 }

This has the address of the array object (white), and the two references (blue) to the two Complex objects.

Lastly, there's no overhead for complex a for this is just a local variable that lives in the stack.

Thanks : ) Please take a look at my update in the question. – Chen Li May 24 '18 at 06:15 — Chen Li, May 24 '18 at 06:15

Why is there no heap overhead for the base reference?

Update

4 Answers4