3

If we have

int a = 123;
int b = 123;

will we end up with two distinct memory bocks allocated for the integer 123 or we only end up with one memory block allocated for 123 and variable a and b are just loaded at the same memory address?

What about

int a = 123;
int b = a;

Does this change the answer?

I tried to print out the memory addresses of both variables in C++ and found that they are different

  int a = 123;
  cout << &a << endl; // 0x7fff46512da0
  int b = 123; 
  cout << &b << endl; // 0x7fff46512da4

Does this mean in that specific environment the program stores duplicate int 123 at two different memory blocks?

Does the answer change if the values are strings?

The reason I am asking this question is that I found out in Python the memory addresses are always the same for primitive values if they are equal. I heard it is because of the constant pool. I wonder if this still applies to C and C++?

e.g.

a = 123
b = 123

print(id(a)) // 9792896
print(id(b)) // 9792896
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Joji
  • 4,703
  • 7
  • 41
  • 86
  • 1
    Separate scalar variables are completely unrelated to each other and don't know that *other* variables might have the same values: `a` and `b` are separate integer variables. – Steve Friedl Jan 09 '22 at 21:16
  • 3
    What language are you using C or C++? If it's C++, there is an [as-if](https://en.cppreference.com/w/cpp/language/as_if) rule. The compiler can optimize that code in such a way where it looks nothing like your original code, as long as the results are the same. Depending on what you do with `a` and `b`, I have seen compilers do amazing things, where at the end there is no `a` or `b`, but instead the final answer is generated by the compiler. – PaulMcKenzie Jan 09 '22 at 21:19
  • 2
    Relevant: [**What exactly is the "as-if" rule?**](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule) TLDR Summary: you don't have to end up with any particular result as the compiler is free to do pretty much anything it wants to if it produces the same observable behavior. – Andrew Henle Jan 13 '22 at 21:47
  • The compiler can optimize out the variable and use a singe stack allocation block but later when being used can move them into separate registers and change values accordingly. If you don't have to access one of the variables any further, chances are it might not even get compiled. – Irelia Jan 16 '22 at 16:30

5 Answers5

13

C and C++ programs have dual natures. The meaning of a program is described using a theoretical model with an abstract computer that executes the program literally as the source describes it. In this model, each object has different memory from every other object, because an object is by definition reserved memory, associated with a type. (Note that string literals in source code may overlap, referring to one common array.)

However, a compiler is not required to produce assembly code that executes this meaning literally. It may produce any program that has the same observable behavior as the original source code. Observable behavior includes the output the program writes the files, input/output interactions, and accesses to volatile objects. In between observable behavior, the compiler can optimize the program, including eliminating unnecessary memory use.

Whenever you define an object, the compiler might not reserve memory for it at all if it is able to make your program work without using such memory. For example, in:

int main(void)
{
    int a = 123;
    int b;
    scanf("%d", &b);
    printf("%d\n", a+b);
}

the compiler is likely to perform the calculation by loading the constant 123 as an immediate operation of an instruction, without reserving separate memory for it.

If the compiler does need memory, perhaps because it does not have enough processor registers to keep everything it is working with in registers, then it might keep only one copy of a constant that is used to initialized two objects which are never changed and whose addresses are not taken.

If you pass the objects by address to other routines or give them different values, the compiler is more likely to reserve separate memory for them, depending on circumstances.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Sorry I am really illiterate when it comes to these low level details. "the compiler is likely to perform the calculation by loading the constant 123 as an immediate operation of an instruction, without reserving separate memory for it." I don't quite understand how we can use constant `123` without reserving separate memory for it. If there is no memory for it, where does `123` come from exactly? – Joji Jan 10 '22 at 03:57
  • 3
    @Joji: If constants were only in memory, how would instructions access them? They need to specify where in memory the data is. So instruction sets generally provide some way to generate at least small constants just from the instructions. Mostly, these are built into the instruction codes; there is an “immediate” operand that contains the value. For example, there might be a “Load Immediate” instruction `LI R3, #4` that puts the value 4 into register 3. For `a+b`, where `a` is known to be 123, the compiler might generate `LI R3, #123` followed by `ADD R4, R3`. – Eric Postpischil Jan 10 '22 at 12:06
  • 1
    @Human-Compiler: `b` is given a value by `scanf`. Also, in C, using the value of an uninitialized automatic object if its address has been taken does not have undefined behavior. The value is indeterminate, but the behavior is not undefined. – Eric Postpischil Jan 14 '22 at 20:37
  • Does `volatile` guarantee that two variables holding the same value always have different memory addresses? As I understand, `volatile` guarantees accessing an object (reading / writing). Does it follow that `volatile` guarantees that, for example, in case of reading `a` and `b` (both holding value `x`) their memory addresses are always different? – pmor Jan 15 '22 at 00:19
  • @EricPostpischil Can I ask if "loading the constant 123 as an immediate operation of an instruction, without reserving separate memory for it." is the same thing as "loading constant 123 directly into CPU registers"? – Joji Jan 17 '22 at 02:07
  • @Joji: Yes, pretty much. A computer processor might have an instruction such as `LI Rn, #value` which loads the value into register `Rn`. The value is encoded directly in the bits of the machine instruction. A processor might have instructions that use immediate operands in other ways, such as `ADD Rn, Rm, #value`, which would add value to the contents of `Rm` and store it in `Rn`. Assemblers might use characters other than `#` to mark immediate operands, such as `$`. – Eric Postpischil Jan 18 '22 at 21:29
8

Eric's answer is very good. I will add some practical cases using C as the base languange for my answer.

Take the following code:

#include <stdio.h>

int main() {

    int a = 123;
    int b = 123;

    printf("%d", a);
    printf("%d", b);
}

If you compile this code with gcc 11.2 x86-64 C compiler (intel asm) the following assembly is produced:

.LC0:
        .string "%d"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], 123
        mov     DWORD PTR [rbp-8], 123
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, DWORD PTR [rbp-8]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        leave
        ret

As you can see storage is provided for the 2 variables.

Now, if I use optimization -O flag, then the following assembly is produced:

.LC0:
        .string "%d"
main:
        sub     rsp, 8
        mov     esi, 123
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     esi, 123
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        add     rsp, 8
        ret

The compiler just uses the 123 literal, because no changes are made to those variables, it figures they can be treated as constant values and no storage will be needed.

That doesn't mean that the literal exist in the ether, it has to be embedded in the assembly.

With Python everything is an object, even primitive types, notice that print(id(a)) and print(id(123)) will render the same result, in both cases the identifier of the specific object 123, a pointer or reference to it, if you will, but nothing related to the variable to which it's assigned.

C/C++, on the other hand, is not like Python, int literals are not objects, there are no references to them, justs the bits. For the 123 literal example, let's try to print its address:

printf("%p\n", (void*)123);

What happens here:

mov esi, 123 // sets ESI register to 123
mov edi, OFFSET FLAT:.LC0 //unimportant, gets the specifier string
mov eax, 0 // sets EAX register to 0
call printf // prints the literal

The output:

0x7b // 123 hexadecimal

Now let's also print the address of a variable that has 123 assigned:

int a = 123;   
printf("%p", (void*)&a);

Looking at the assembly we can spot the difference:

mov DWORD PTR [rsp+12], 123 // moving `123` literal to its address
lea rsi, [rsp+12] // placing the address in the register
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf // printing the address

In this case the address of the variable is printed, as expected. The literal was placed in the memory location where the variable a lives, therefore we can print its address.

If you have two variables with the same value, they're probably going to have different addresses, but if the compiler finds a way to have only one address or no memory storage at all for the two variables, and still produce the desired outcome, there is no rule preventing it.

There is little to no constraints in the language standard about what a compiler can do, it just has to conform with the language standard rules and produce a program that in all circumstances behaves in a consistent, defined manner, provided that it is correctly coded.

The assignment of a to b by itself doesn't change much, nor does the fact that the literal is a string, there probably will be only one copy of the same literal (especially considering that string literals created by assingnent to pointers are immutable), unless there are other constraints preventing it.

Side note:

C and C++ are different languages, I want to explicitly point this out because more often than not C++ is mistakenly regarded as a superset of C, though that may have been the case in the early years, it is not true today, these are very different languages, despite of the fact that C++ retains compatibility, for the most part, for C code.

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • hey thanks for the reply! You mentioned that the compiler could just use the 123 literal, in which case we might even end up with no storage for any of the variables. If we don't have memory for any of the variables, then do we at least have memory reserved for integer `123` somewhere? Sorry if my question is vague but I am just struggling to understand these low level details. – Joji Jan 10 '22 at 03:54
  • "mov DWORD PTR [rsp+12], 123 // moving `123` literal to its address" does this mean it copies the bits that actually make up`123` in that address? – Joji Jan 17 '22 at 01:21
  • @Joji, yes. freely speaking. It's normally represented in hexadecimal which, if you think about it, each hex digit represents a 4 bit binary value. But, what it really is, is electric charge ;) – anastaciu Jan 17 '22 at 08:52
  • thanks since we are on this topic - I have seen people said something like "memory of array of bytes. " vs. some data stored in memory are bits... which one is correct - saying memory is bytes or bits? – Joji Jan 19 '22 at 01:02
  • @Joji, data is stored in bits, 0s and 1s my friend, that is the correct definition, I remind you that a byte is nothing less than 8 bits. *memory of array of bytes* doesn't seem like a canonical term to me, I would accept it as a colloquial term. – anastaciu Jan 19 '22 at 08:21
5

will we end up with two distinct memory bocks allocated

As far as the abstract machine is concerned: Yes, the variables have overlapping storage duration, so they must have distinct memory addresses.

As far as the language implementation is concerned: It depends. There could even be no memory used at all if isn't needed.

Does this change the answer?

No.

eerorika
  • 232,697
  • 12
  • 197
  • 326
3

If we have

int a = 123
int b = 123

will we end up with two distinct memory bocks allocated for the integer 123 or we only end up with one memory block allocated for 123 and variable a and b are just loaded at the same memory address?

Regardless of what values they are initialized with or hold at any time, the two objects declared by those two distinct declarations are logically distinct objects, with, therefore, logically distinct storage.

Compilers may play all manner of games and trickery under the hood, but there is no way for a conforming C program containing those declarations, running on a conforming C implementation, to perceive them as referring to the same object or having the same or overlapping storage.

What about

int a = 123
int b = a

Does this change the answer?

No. The initial values specified, if any, have nothing to do with whether the two objects declared have the same storage (as far as C semantics are concerned or can discern).

I tried to print out the memory addresses of both variables in C++ and found that they are different

  int a = 123;
  cout << &a << endl; // 0x7fff46512da0
  int b = 123; 
  cout << &b << endl; // 0x7fff46512da4

Does this mean in that specific environment the program stores duplicate int 123 at two different memory blocks?

Yes.

Does the answer change if the values are strings?

No, but see below. Whether by "strings" you mean std::strings or arrays of char or pointers to char, separately declared objects are separate objects, with separate storage.

However, different appearances of C string literals with the same content do not necessarilly have separate storage. These are not declared objects, so this does not conflict with the "No"s above, but it does mean that in a case such as this ...

const char *a = "foo";
const char *b = "foo";

... it might be true that a == b. Even then, however, you can still rely on &a == &b to evaluate to false, because the storage for the pointers identified by a and b is different, even if they point to the same object . If you don't understand why I'm calling out this case then that's fine -- all the better, in fact.

The reason I am asking this question is that I found out in Python the memory addresses are always the same for primitive values if they are equal. I heard it is because of the constant pool. I wonder if this still applies to c/c++?

Python is very different from C++, and even more different from C. Python's built-in types (its language specification does not use the term "primitive types") are all object types, analogous to C++ classes. Values of Python built-in types are analogous to C++ class instances. For most numeric types among them, these require several times more storage than would be needed to represent the numeric value alone, so for efficiency, Python has a constant pool so that some of those objects can be shared rather than duplicated. That works out because the types involved are immutable, so for the most part, only the values they represent are important, not the identities of the objects containing them.

That is not the case for C's built-in types, whether in C or in C++. These are not represented by wrapper objects, just by the bits that make up the values themselves. Moreover, it's not even an apples-to-apples comparison. All Python variables behave similarly to C++ references (and even more like Java references), and there are no pointers. Thus, in Python you cannot even talk about the storage for a variable itself, only about the storage for the object to which it refers.

Different Python variables do have their own, separate storage in exactly the same sense that C and C++ variables do, but you cannot look at it or touch it except to determine the object to which the variable refers or to make it refer to a different object. But you know that they have distinct storage because if they didn't then assigning a new value to one would change the value of the other as well.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • A few followup questions if you don't mind: 1. "However, different appearances of C string literals with the same content do not necessarilly have separate storage. These are not declared objects" why is that strings are different than integer as you mentioned strings are not declared objects but integers are? 2. "All Python variables behave similarly to C++ references" I am struggling to understand the differences between a reference and a pointer here. " only about the storage for the object to which it refers." is the verb`refer`the same thing as `point`? – Joji Jan 12 '22 at 01:47
  • @Joji, (1) string literals, such as `"foo"` are not *declared* objects, which essentially just means that they do not have identifiers assigned to them. They nevertheless do represent objects, which means that there is specific associated storage for their (`char` array) values. There are reasons for this design decision, but different choices could have been made, so the "why" boils down to "because Dennis Ritchie said so". Values of type `int`, such as `1`, are neither declared nor objects, just values. This is again because Dennis Ritchie said so, though [...] – John Bollinger Jan 12 '22 at 02:08
  • [...] this is logical and predictable choice based on the purpose and development context of the C language. These are both to be distinguished from *variables* of any type (`char a[4];` or `int b;`) which are declared objects. – John Bollinger Jan 12 '22 at 02:11
  • @Joji, (2) Pointer and reference values both function to provide access to objects stored elsewhere. In languages that have only one or the other, the choice of terminology is largely an arbitrary decision. C++ is unusual in having both. They have different declaration and usage syntax, but similar function (details are too much for a comment). The verb `refer` describes a reference's indirection to another object. The verb `point` describes a pointer's indirection to another object. These are analogous, and sometimes they are used interchangeably. – John Bollinger Jan 12 '22 at 02:30
  • a few followup questions if you don't mind: 1. it seems like generally `reference` and `pointer` means the same thing - a variable whose value is a memory address of the object which it points/refers to. Based on this, why is that "All Python variables behave similarly to C++ references (and even more like Java references), and there are no pointers." - why is that there are only references in Python not pointers since they are the same essentially? – Joji Jan 12 '22 at 05:42
  • just to double-check my understanding: "The verb refer describes a reference's indirection to another object. The verb point describes a pointer's indirection to another object". Are all references and pointers themselves variables with the values being memory addresses of the other objects (therefore you said `indirection`)? On the other hand normal variables "have" the actual values, i.e. they are an alias for some block of memory at which the actual values are stored. – Joji Jan 12 '22 at 05:46
  • Generally yes, @Joji, but (i) the word "pointer" is used both to describe "variables with the values being memory addresses of the other objects" and to describe the values themselves that such a variable may take. Similar is true of the names of other types and type categories. (ii) Although they have very similar purpose and behavior pointers are not references and references are not pointers, notwithstanding the usage of verbs "point" and "refer". In some languages this is largely a matter of conventional terminology, but in others, such as C++, they are genuinely distinguishable. – John Bollinger Jan 12 '22 at 13:29
  • Hey thanks for the reply. Could you expound on why "there are no pointers in Python"? since everything is an object in Python so I thought every variables are of pointer types which point to those objects. Also could you expound a little bit on the word "indirection" you used earlier? – Joji Jan 12 '22 at 21:43
  • @Joji, there are no pointers in Python because Python does not use that terminology. It would serve no purpose anyway, because calling all variables pointers would not add any useful distinction. The statement you quoted has nothing to do with language semantics. – John Bollinger Jan 12 '22 at 21:50
  • "That is not the case for C's built-in types, whether in C or in C++. These are not represented by wrapper objects, just by the bits that make up the values themselves. " does this mean that in C/C++, values like number or strings e.g. `int a = 123` are made of the number or string itself, as opposed to languages like python where the variable `a` would be a pointer which has the address to that value, as opposed to having the actual value at the variables' memory locations? – Joji Jan 12 '22 at 22:00
  • @Joji, for numbers, yes, that's exactly what it means. As for strings, you'll have to specify *exactly* what you mean. There are several different things that people may mean when they use that term with respect to C, and one or two additional ones that they may mean when they use it with respect to C++. And incidentally, C and C++ are two rather different languages with a common subset. There is no such thing as "C/C++". – John Bollinger Jan 12 '22 at 22:04
  • got it! I heard there might be string interning and constant pools so it might be handled very differently. Are there any other kinds of data types outside of number in C++ are represented by the bits that make up the values themselves? Maybe boolean? Also I guess when you said "represent" there you meant the variable represents/ is an alias of/ is loaded at some memory location at which its value is stored right? – Joji Jan 12 '22 at 22:22
  • That's not as straightforward a question as you may think, @Joji. I am inclined to say that *all* types in C and all but references and (maybe) class instances in C++ have that property, but that depends on a certain understanding of what "the bits that make up the values themselves" are. – John Bollinger Jan 12 '22 at 22:30
  • @Joji, when I said "These are not represented by wrapper objects", I was talking about values not variables. This is *in addition to* the difference between direct and indirect storage that I discussed afterward. – John Bollinger Jan 12 '22 at 22:34
  • About the direct and indirect storage you mentioned earlier: Does "direct" mean that something like `int a = 1`, the variable has/is loaded at some memory address at which the value of integer `1` is stored and "indirect" means pointer variables like `Obj *ptr` - the variable has some memory address at which another memory address of the actual object value is stored? Please correct me if I am wrong here. And I guess in C++, like pointers, references are also "indirect storages"? – Joji Jan 12 '22 at 23:43
  • @Joji, I feel like we are going around in circles. But yes, you have understood "direct" and "indirect" correctly. This is pretty standard terminology. And yes, C++ references are also a form of indirection. – John Bollinger Jan 13 '22 at 01:15
  • Can I ask a related question - in high level languages like JavaScript or Python, if we have a variable and assign an object to it, is it more correct to say "Variable `foo` point to some object" than to say "Variable **has** (the value of) some object" since the variable here are indirect storages for the object, since on the variables's memory block only the memory address of that object is stored. – Joji Jan 13 '22 at 05:15
  • @Joji, I think we have already come more than far enough afield from the original question. – John Bollinger Jan 13 '22 at 14:34
0

Unlike Python, where variables are actually references to some memory location and all constants have an implicit address in memory, variables in C and C++ represent some memory location and most constants do not have an address but instead are embedded in the executable code.

If you do something like this in Python:

a = 123
b = 123
a = 456

The C code that most resembles what the above is doing is this:

const int value_123 = 123;
const int value_456 = 456;
int *a = &value_123;
int *b = &value_123;
a = &value_456;

So in Python parameters to functions are essentially always passed by reference, while in C and C++ parameters are always passed by value unless the reference token is used when declaring a parameter in C++ (i.e. void foo(int &a)).

The only constants in C and C++ that have an address are string constants. They have type "const array of char". So to refer to one you need a const char * to point to the first element of the array:

const char *str = "Hello";
dbush
  • 205,898
  • 23
  • 218
  • 273
  • Hey thanks for the replies. When you said "Unlike Python, where variables are actually references to some memory location and all constants have an implicit address in memory, variables in C and C++ represent some memory location" I am not exactly sure what you meant by "a variable references to some memory location vs. a variable represents some memory location" - it seems like `represent` and `reference` are just synonyms. Also I tried to print the address of `*a` and `*b` but it still shows that they are different... – Joji Jan 11 '22 at 23:33
  • @Joji What I mean by "reference" is basically a pointer hidden by the language syntax, while by "represent" I mean that a variable and its address are essentially one in the same. Also, what *exactly* do you mean when you say you printed the addresses of `*a` and `*b`? Before the last line of the above C code, `*a` and `*b` will have the same value (i.e. the address of `valid_123`) but the addresses of `a` and `b` themselves will be different because they are different variables. – dbush Jan 11 '22 at 23:45
  • right thanks... I meant the addresses of two variables of the pointer type are different. Is it true that two different variables in C++/C cannot have the same memory locations, (like what happens in the Python example)? Btw I still don't quite understand what exactly "reference" mean as in "variables are actually references to some memory location". Is this like the difference between a pointer type and other data type? – Joji Jan 12 '22 at 01:36