Tuple vs String vs frozenset. Immutable objects and the number of copies in memory

Question

a = "haha"
b = "haha"
print a is b  # this is True

The above code prints true. I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough. But in the case of a tuple:

a = (1, 2, 3)
b = (1, 2, 3)
print a is b  # this is False

This will print False despite the fact that tuples are also immutable in python. After doing some more research, I discovered that tuples can contain mutable elements, so I guess it makes sense to have multiple copies of tuples in memory if it's too expensive to figure out whether a tuple contains mutable objects or not. But when I tried it on frozenset

a = frozenset([1,2])
b = frozenset([1,2])
print a is b  # False

This will also print false. As far as I know frozenset are themselves immutable and can only contain immutable objects (I tried to create a frozenset which contains a tuple which contains a mutable list but it's not allowed), and that we can use == to check if two frozensets are identical in value, so why does python create two copies of them in memory?

Try `a = "$foo"` and `b = "$foo"`, you are seeing string interning http://stackoverflow.com/questions/28329498/why-does-a-space-effect-the-identity-comparison-of-equal-strings/28329522#28329522. You might also notice different behaviour using the code in functions — Padraic Cunningham, Oct 09 '15 at 20:53
You would also see the original behaviour with `a, b = "$foo","$foo"`, all optimizations are implementation specific and not something to rely on — Padraic Cunningham, Oct 09 '15 at 20:59
Hi @Padraic Cunningham. Your answer in the other post is very instructive, and I did get the same results using my python interpreter. However when I'm running it in a script with numbers larger than 256 and string with space in them as well as tuple assignment, all "is" comparisons returns true. — x7qiu, Oct 13 '15 at 02:23
yes, again they are implementation specific optimizations that the compiler makes, not something you can always rely on. I used to have a link saved that had a detailed explanation of all the different optimizations, I will post it if I can find it. — Padraic Cunningham, Oct 13 '15 at 10:56

Chad S. · Answer 1 · 2015-10-09T21:10:27.870

1

It's because of the way the python byteops are compiled. When your program is run the first time it compiles the code into byte operations. When it does this and sees string (or some integer) literals in the code, it will create a string object and use a reference to that string object wherever you typed that literal. But in the case of a tuple it's difficult (in some cases impossible) to determine that the tuples are the same, so it doesn't take the extra time to perform this optimization. It is for this reason that you should not generally use is for comparing objects.

edited Oct 09 '15 at 21:10

answered Oct 09 '15 at 20:55

Chad S.

6,252
15
25

1

You *should* use `is` for comparing objects by their identity, you should not use it for testing equality – Padraic Cunningham Oct 09 '15 at 21:34
I can't think of many real world use-cases where checking object identity was needed or helpful. – Chad S. Oct 09 '15 at 21:47
there are plenty cases but that is irrelevant really, the point is *you should not generally use is for comparing objects* is pretty misleading. – Padraic Cunningham Oct 09 '15 at 21:49
I think the number of use cases for ```is``` is relevant to determining whether or not you should generally use ```is``` for comparing objects. – Chad S. Oct 09 '15 at 21:51
If you are checking if 2 objects are in fact the same object, you use `is` which bar certain things in numpy should always return True if a is b , testing for equality you use `==`, no real ambiguity there. You certainly never check for identity using `==`. – Padraic Cunningham Oct 09 '15 at 21:56
I know what ```is``` is for. I'm arguing that you shouldn't really ever need to use it. – Chad S. Oct 09 '15 at 22:00
1

When comparing with a singleton you should use is, because it is faster. – ByoTic Oct 09 '15 at 22:00
2

Are you saying you never saw `if x is None` `if x is not None` or as ByoTic commented using a sentinel value you would always use is `my_sentinel = object()` `if x is not my_sentinel`, there are numerous use cases. – Padraic Cunningham Oct 09 '15 at 22:02
I can understand that tuples are difficult to check if they contain mutable objects. But how about frozen set? – x7qiu Oct 13 '15 at 02:26
@user1248785 I don't understand your obstinance here. You are trying to use ```is``` in situations contrary to the design of the keyword and language. If you want to change the way the language works take it up with the dev mailing list. You asked why it is done the way it is, and I've tried to explain. There are costs with doing a lookup every time an immutable object is instantiated and little benefit to using ```is``` vs ```==``` in that situation. Just use the operator that does what you want (```==```). – Chad S. Oct 13 '15 at 16:07
@ChadSimmons I don't see how I am being obstinate trying to understand WHY the language works the way it does. Having previously studied C the way Python handles variables is very interesting to me, and I simply what to know why. – x7qiu Oct 13 '15 at 20:02

score 1 · Accepted Answer · answered Oct 09 '15 at 20:57

1

Your sentence "I've read that one of the reasons for this is because strings are immutable, so one copy in memory will be enough." is correct but it is not true all the times. for example if you do the same with the string "dgjudfigur89tyur9egjr9ivr89egre8frejf9reimfkldsmgoifsgjurt89igjkmrt0ivmkrt8g,rt89gjtrt" It won't be the same object (at least on my python's version). The same phenomenon can be replicated in integers, where 256 will be the same object but 257 won't. It has to do with the way python caches objects, it saves "simple" objects. Each object has its criteria, for string it is only containing certains characters, for integers their range.

answered Oct 09 '15 at 20:57

ByoTic

103
8

Hi @ByoTic. I tried with number larger than 256 and the string "dgjudfigur89tyur9egjr9ivr89egre8frejf9reimfkldsmgoifsgjurt89igjkmrt0ivmkrt8g,rt89gjtrt" . The "is" operator returns false in python interpreter but when running them in a script the results returned are all "True". I'm using python 2.7 on a Mac. – x7qiu Oct 13 '15 at 02:25
It happened to me too actually. Might be some kind of optimization (saving the literal in the compiled code itself) and in this case the interpreter handles it differently. – ByoTic Oct 13 '15 at 21:08
Python maintains an [internal integer object array](https://medium.com/@nick3499/python-internal-integer-object-array-9abe1f0efa82) for all integers from **-5** to **256**. – noobninja Sep 20 '17 at 17:24

Tuple vs String vs frozenset. Immutable objects and the number of copies in memory

2 Answers2