Why are python references "not" their string counter parts?

Question

Python's is operator returns true when two objects point to the same reference.

With this in mind, I ran a few tests and got the following results for various values of s.

# I set s to "a", 5, "hello", and "hello there"

s is "a" => True
s is 5 => True
s is "hello" => True
s is "hello there" => False

Why does the last one return false?

Roland Smith · Accepted Answer · 2019-07-25T17:08:54.040

CPython "internalizes" small integers and small strings as an optimization. AFAIK, this is not a part of the language, but of this specific implementation.

When CPython starts, a range of integers objects will be created even though your program hasn't referenced them.

So every time you say e.g. s = 5, the s variable just becomes a reference to the pre-allocated integer object 5.

For example:

In [14]: for i in range(10): 
    ...:     print(i, id(i)) 
    ...:                                                                                                 
0 34375408656
1 34375408688
2 34375408720
3 34375408752
4 34375408784
5 34375408816
6 34375408848
7 34375408880
8 34375408912
9 34375408944

These ID's stay the same, even in different CPython sessions. (This is on 64-bit UNIX; the numbers might be different on other machines. But even there they should be the same in different CPython sessions)

Compare:

In [1]: s = "hello there"                                                                                
Out[1]: 'hello there'

In [2]: id(s)                                                                                            
Out[2]: 34513947376

In [3]: id("hello there")                                                                                
Out[3]: 34517432752

In [4]: id("hello there")                                                                                
Out[4]: 34527873968

In [5]: id("hello there")                                                                                
Out[5]: 34518225712

In [6]: id("hello there")                                                                                
Out[6]: 34512957808

Apparently, the string hello there is too long for this internalization mechanism, so every instance is a new one.

score 1 · Answer 2 · answered Jul 25 '19 at 16:56

Your question is a little unclear but it sounds like you're doing something like the following:

for s in [ 'a', 5, 'hello', 'hello there' ]:
    print(s is 'a')
    print(s is 5)
    print(s is 'hello')
    print(s is 'hello there')

Is that right?

Behind the scenes, python "interns" small integers, 0- and 1-length strings, and strings that look like identifiers (that is, they are made of letters, numbers, and underscores) as an optimization. So, the first three objects in the list are successfully interened and essentially reused directly in the print statements (so is => True), while the last one ('hello there') misses the interning logic and is created separately outside and inside the loop (so is => False).

There's an interesting article about interning here: Python Interning Description

Why are python references "not" their string counter parts?

2 Answers2