Eric's answer is very good. I will add some practical cases using C as the base languange for my answer.
Take the following code:
#include <stdio.h>
int main() {
int a = 123;
int b = 123;
printf("%d", a);
printf("%d", b);
}
If you compile this code with gcc 11.2 x86-64 C compiler (intel asm) the following assembly is produced:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 123
mov DWORD PTR [rbp-8], 123
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, DWORD PTR [rbp-8]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
As you can see storage is provided for the 2 variables.
Now, if I use optimization -O
flag, then the following assembly is produced:
.LC0:
.string "%d"
main:
sub rsp, 8
mov esi, 123
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov esi, 123
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
add rsp, 8
ret
The compiler just uses the 123
literal, because no changes are made to those variables, it figures they can be treated as constant values and no storage will be needed.
That doesn't mean that the literal exist in the ether, it has to be embedded in the assembly.
With Python everything is an object, even primitive types, notice that print(id(a))
and print(id(123))
will render the same result, in both cases the identifier of the specific object 123
, a pointer or reference to it, if you will, but nothing related to the variable to which it's assigned.
C/C++, on the other hand, is not like Python, int
literals are not objects, there are no references to them, justs the bits. For the 123
literal example, let's try to print its address:
printf("%p\n", (void*)123);
What happens here:
mov esi, 123 // sets ESI register to 123
mov edi, OFFSET FLAT:.LC0 //unimportant, gets the specifier string
mov eax, 0 // sets EAX register to 0
call printf // prints the literal
The output:
0x7b // 123 hexadecimal
Now let's also print the address of a variable that has 123
assigned:
int a = 123;
printf("%p", (void*)&a);
Looking at the assembly we can spot the difference:
mov DWORD PTR [rsp+12], 123 // moving `123` literal to its address
lea rsi, [rsp+12] // placing the address in the register
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf // printing the address
In this case the address of the variable is printed, as expected. The literal was placed in the memory location where the variable a
lives, therefore we can print its address.
If you have two variables with the same value, they're probably going to have different addresses, but if the compiler finds a way to have only one address or no memory storage at all for the two variables, and still produce the desired outcome, there is no rule preventing it.
There is little to no constraints in the language standard about what a compiler can do, it just has to conform with the language standard rules and produce a program that in all circumstances behaves in a consistent, defined manner, provided that it is correctly coded.
The assignment of a
to b
by itself doesn't change much, nor does the fact that the literal is a string, there probably will be only one copy of the same literal (especially considering that string literals created by assingnent to pointers are immutable), unless there are other constraints preventing it.
Side note:
C and C++ are different languages, I want to explicitly point this out because more often than not C++ is mistakenly regarded as a superset of C, though that may have been the case in the early years, it is not true today, these are very different languages, despite of the fact that C++ retains compatibility, for the most part, for C code.