Which integers and strings that get automatically interned in Python is implementation specific, and has changed between versions.
Here are some principles and limits that seem to hold at least for my current installation (CPython 3.10.7):
All integers in the range [-5, 256] are automatically interned:
>>> x = 256
>>> y = 256
>>> x is y
True
>>> x = 257
>>> y = 257
>>> x is y
False
CPython (version >= 3.7) also automatically interns strings if they are <= 4096 characters long, and only consist of ASCII letters, digits, and underscores. (In CPython versions <= 3.6, the limit was 20 characters).
>>> x = "foo"
>>> y = "foo"
>>> x is y
True
>>> x = "foo bar"
>>> y = "foo bar"
>>> x is y
False
>>> x = "A" * 4096
>>> y = "A" * 4096
>>> x is y
True
>>> x = "A" * 4097
>>> y = "A" * 4097
>>> x is y
False
In some versions the rule was apparently to intern strings looking like valid identifiers (e.g., not strings starting with a digit), but that does not appear to be the rule in my installation:
>>> x = "5myvar"
>>> y = "5myvar"
>>> x is y
True
>>> 5myvar = 5
File "<stdin>", line 1
5myvar = 5
^
SyntaxError: invalid decimal literal
Additionally, strings are interned at compile time, not at runtime:
>>> x = "bar"
>>> y = "".join(["b","a","r"])
>>> x
'bar'
>>> y
'bar'
>>> x is y
False
Relying on automatic string interning is risky (it depends on the implementation, which may change). To ensure a string is interned you can use the sys.intern()
function:
>>> x = "a string which would not normally be interned!"
>>> y = "a string which would not normally be interned!"
>>> x is y
False
>>> import sys
>>> x = sys.intern(x)
>>> y = sys.intern(y)
>>> x is y
True