37

It seems that 2 is 2 and 3 is 3 will always be true in python, and in general, any reference to an integer is the same as any other reference to the same integer. The same happens to None (i.e., None is None). I know that this does not happen to user-defined types, or mutable types. But it sometimes fails on immutable types too:

>>> () is ()
True
>>> (2,) is (2,)
False

That is: two independent constructions of the empty tuple yield references to the same object in memory, but two independent constructions of identical one-(immutable-)element tuples end up creating two identical objects. I tested, and frozensets work in a manner similar to tuples.

What determines if an object will be duplicated in memory or will have a single instance with lots of references? Does it depend on whether the object is "atomic" in some sense? Does it vary according to implementation?

fonini
  • 2,989
  • 3
  • 21
  • 39

2 Answers2

38

Python has some types that it guarantees will only have one instance. Examples of these instances are None, NotImplemented, and Ellipsis. These are (by definition) singletons and so things like None is None are guaranteed to return True because there is no way to create a new instance of NoneType.

It also supplies a few doubletons 1 True, False 2 -- All references to True point to the same object. Again, this is because there is no way to create a new instance of bool.

The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple and empty frozenset.

Finally, Cpython interns certain immutable objects during parsing...

e.g. if you run the following script with Cpython, you'll see that it returns True:

def foo():
    return (2,)

if __name__ == '__main__':
    print foo() is foo()

This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.


1I might have just made up that word ... But hopefully you get the idea...
2Under normal circumstances, you don't need do check if the object is a reference to True -- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ... will execute the branch. But, I put that in here just for completeness.


Note that is can be used to compare things that aren't singletons. One common use is to create a sentinel value:

sentinel = object()
item = next(iterable, sentinel)
if items is sentinel:
   # iterable exhausted.

Or:

_sentinel = object()
def function(a, b, none_is_ok_value_here=_sentinel):
    if none_is_ok_value_here is sentinel:
        # Treat the function as if `none_is_ok_value_here` was not provided.

The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==. For more details on the difference between is and == (and when to use which), consult one of the following posts:


Addendum

We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is operator).

String "interning" and dictionary lookups.

Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.

import timeit

interned = 'foo'
not_interned = (interned + ' ').strip()

assert interned is not not_interned


d = {interned: 'bar'}

print('Timings for short strings')
number = 100000000
print(timeit.timeit(
    'd[interned]',
    setup='from __main__ import interned, d',
    number=number))
print(timeit.timeit(
    'd[not_interned]',
    setup='from __main__ import not_interned, d',
    number=number))


####################################################

interned_long = interned * 100
not_interned_long = (interned_long + ' ').strip()

d[interned_long] = 'baz'

assert interned_long is not not_interned_long
print('Timings for long strings')
print(timeit.timeit(
    'd[interned_long]',
    setup='from __main__ import interned_long, d',
    number=number))
print(timeit.timeit(
    'd[not_interned_long]',
    setup='from __main__ import not_interned_long, d',
    number=number))

The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.

Tuple "interning"

Here's a small script you can play around with:

import timeit

def foo_tuple():
    return (2, 3, 4)

def foo_list():
    return [2, 3, 4]

assert foo_tuple() is foo_tuple()

number = 10000000
t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number)
t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))

print(t_interned_tuple)
print(t_list)
print(t_interned_tuple / t_list)
print('*' * 80)


def tuple_creation(x):
    return (x,)

def list_creation(x):
    return [x]

t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number)
t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number)
print(t_create_tuple)
print(t_create_list)
print(t_create_tuple / t_create_list)

This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple() takes on average about 40% the time that foo_list() takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).

Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:

def foo_tuple():
    return (2,)

def bar_tuple():
    return (2,)

def foo_string():
    return 'foo'

def bar_string():
    return 'foo'

print(foo_tuple() is foo_tuple())  # True
print(foo_tuple() is bar_tuple())  # False

print(foo_string() is bar_string())  # True

We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.

Community
  • 1
  • 1
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 3
    Great including the sentinel, that's the most common (and practical) non-None use of identity that I've seen in Python. And the reason it's useful is in the case the you might have `None` as an input, so while one is immutable and the other isn't, they are related in that way. – Jared Goguen Apr 28 '16 at 00:39
  • 1
    @JaredGoguen -- Yeah, the other option is to just do `next(iterable)` ... and catch the `StopIteration`. In some cases, I'd argue that's better, but ... I've been known to use the sentinel object a time or two. I guess it depends on the rest of the surrounding code. – mgilson Apr 28 '16 at 00:41
  • Definitely depends on the context, but I would argue that the `is` is generally more readable, enforced by that it's a common pattern. – Jared Goguen Apr 28 '16 at 00:46
  • 1
    Great answer. Shouldn't it start "Python has some *objects*…"? Your examples are all objects — not types. – Neil G Apr 28 '16 at 01:25
  • 1
    Also, you can add interned strings to the category of objects that are reused. – Neil G Apr 28 '16 at 01:27
  • @NeilG -- I've added a blurb about interning. Feel free to look it over and comment or edit. – mgilson Apr 28 '16 at 01:49
  • 1
    Wow, I didn't even know that CPython interned tuples. – Neil G Apr 28 '16 at 01:53
  • 1
    Ignoring everything else, the `is` and `==` distinction is the most important part. – Pharap Apr 28 '16 at 03:37
  • 2
    @Pharap -- Probably, but that part has been [asked](http://stackoverflow.com/q/132988/748858) and [answered](http://stackoverflow.com/questions/14247373/python-none-comparison-should-i-use-is-or/14247383#14247383) before (a couple times) ... This question is different enough that I think it isn't a dupe, but I didn't want to focus on the already answered stuff... – mgilson Apr 28 '16 at 03:47
  • 1
    @NeilG actually python does not intern tuples. It's the peephole optimizer that when compiling will make sure that all literals point to the same tuple value. The difference is that peephole optimizations store the value in the code while interning builds a table structure. So using the same tuple literal in different files should yield different objects, while using a string literal produces the same object. (Just checked and indeed it behaves this way) – Bakuriu Apr 28 '16 at 10:16
  • @bakuriu. Interesting okay. However, CPython could in theory intern tuples if it wanted to? – Neil G Apr 28 '16 at 14:25
21

It varies according to implementation.

CPython caches some immutable objects in memory. This is true of "small" integers like 1 and 2 (-5 to 255, as noted in the comments below). CPython does this for performance reasons; small integers are commonly used in most programs, so it saves memory to only have one copy created (and is safe because integers are immutable).

This is also true of "singleton" objects like None; there is only ever one None in existence at any given time.

Other objects (such as the empty tuple, ()) may be implemented as singletons, or they may not be.

In general, you shouldn't necessarily assume that immutable objects will be implemented this way. CPython does so for performance reasons, but other implementations may not, and CPython may even stop doing it at some point in the future. (The only exception might be None, as x is None is a common Python idiom and is likely to be implemented across different interpreters and versions.)

Usually you want to use == instead of is. Python's is operator isn't used often, except when checking to see if a variable is None.

mipadi
  • 398,885
  • 90
  • 523
  • 479
  • 2
    Integers from -5 to 255 are singletons and cached in memory. – Mazdak Apr 27 '16 at 19:19
  • 2
    Also, it doesn't really make sense to be comparing identity, e.g. `id(a) == id(b)`, for immutable types (besides `None` and `object()`). An equality comparison tells you everything you need to know and, since the object is immutable, it's not like a change in one will affect another. – Jared Goguen Apr 27 '16 at 19:22
  • 6
    This is a good answer, however it feels like you are treating the singleton-ness of `None` on the same plane as the singleton-ness of `1`. And those are completely different things. It is _guaranteed_ that there is only one `None` and that `is` comparisons of `None` with `None` will always be `True`. There are other objects in here too (`NotImplemented` and `Ellipsis` come to mine immediately). However, as you have said, the fact that `1` is a singleton is an implementation detail. – mgilson Apr 27 '16 at 19:24