Console assigns different IDs to identical immutable objects

Question

I'm using Python 3.10.1. When I run the following code as a .py file, it prints True:

a = (1, 2)
b = (1, 2)
print(a is b)

However, when I type the same lines of code into the interactive Python console, I get False. Furthermore, when I assign to both variables on the same line, i.e.

a = (1, 2); b = (1, 2)
print(a is b)

once again I get True. Is there some explanation for this behavior or is it a bug?

The `is` operator does not do the same thing as the `=` operator. You're confusing identity with equality. Immutability isn't really a consideration. — martineau, Jan 15 '22 at 17:40
When you create a new tuple, Python does not search *every tuple previously created* to see if it's a duplicate - that would take a long time, for very little chance of doing any good. But when when the duplicate is created in the *same compilation* (same interactive statement, or same script), the compiler can reasonably notice the duplication. — jasonharper, Jan 15 '22 at 17:42
You just should not write code which cares about the identity of immutable objects. The language spec makes almost no guarantees, even calling the `tuple` constructor isn't guaranteed to give you a fresh object. — kaya3, Jan 16 '22 at 09:45

S.B · Answer 1 · 2022-01-16T09:29:03.533

From document:

A Python program is constructed from code blocks. A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition. Each command typed interactively is a block. A script file (a file given as standard input to the interpreter or specified as a command line argument to the interpreter) is a code block. A script command (a command specified on the interpreter command line with the -c option) is a code block. A module run as a top level script (as module __main__) from the command line using a -m argument is also a code block. The string argument passed to the built-in functions eval() and exec() is a code block.

That's because your first code is inside a code block(a module) and it executed as a unit. But in interactive shell , when you execute them in two different commands, they are in different code blocks.

Python can re-use the reference of some immutable types like tuple inside a code block. That's just an optimization not a bug.

Let's examine it with functions(remember a function body is also a code block) and integers bigger than 256 this time:

# inside a python file

def fn1():
    a = 1000
    b = 1000
    print("id of 'a'", id(a))
    print("id of 'b'", id(b))


def fn2():
    c = 1000
    print("id of 'c'", id(c))
    
fn1()
fn2()

# id of 'a' 1738701965680
# id of 'b' 1738701965680
# id of 'c' 1738701965680

Now:

# inside interactive mode

>>> def fn1():
...     a = 1000
...     b = 1000
...     print("id of 'a'", id(a))
...     print("id of 'b'", id(b))
...
>>> def fn2():
...     c = 1000
...     print("id of 'c'", id(c))
...
>>> fn1()
id of 'a' 2441294813616
id of 'b' 2441294813616
>>> fn2()
id of 'c' 2441294813264
>>>

When fn1 and fn2 are inside a module, they are executed inside a code block but in the second one, they are not in a module, they executed separately. In fn1 however a and b point to the same object.

Answer to comment :

Why does Python use the memory addresses of immutable objects as their IDs rather than hashing their content so that the same value always yield the same ID?

Take a look at this answer, particularly when Martjin said "3. The code object is not referenced by anything, reference count drops to 0 and the code object is deleted. As a consequence, so is the string object.". So I can say that in REPL if you want to have the same ID for the string, the memory of the code object should not get freed. I think this is the main downside to this and why core developers didn't decide that way.

Why does Python use the memory addresses of immutable objects as their IDs rather than hashing their content so that the same value always yield the same ID? — Ron Inbar, Jan 15 '22 at 18:41
ids must be globally unique. Even if you can write a hashing algorithm that *never* outputs the same hash for two different values (impossible), if you have different algorithms for computing the ids of different types of object then how can you guarantee uniqueness? Using hashing would also be solving a non-problem anyway because you just should not care what the id of a tuple is. — kaya3, Jan 16 '22 at 09:50

Console assigns different IDs to identical immutable objects

1 Answers1