Why is Python's intern
built-in only for strings? It should be possible to extend intern
to classes that are hashable and comparable, right?

- 32,138
- 39
- 156
- 257
-
1You could create an object cache like `intern` for immutable objects. – Peter Graham Aug 03 '11 at 23:15
-
@Peter: you're right. The advantage of `intern` is that all of the code for that is automatically generated, and as a bonus, it's in fast C++. – Neil G Aug 04 '11 at 21:11
-
2@NeilG I don't think so. AFAIK CPython is completly written in C, not C++. – glglgl May 26 '13 at 08:45
3 Answers
The purpose of interning things is to be able to compare them by comparing their memory address; you ensure that you never create two objects with the same value (when the program requests the creation of a second object with the same value as an existing object, it instead receives a reference to the pre-existing object). This requires that the things you're interning be immutable; if the value of an interned object could change, comparing them by address isn't going to work.
In Python, it's not possible to enforce the immutability of user-defined class instances, so it wouldn't be safe to intern them. I suspect that's the main theoretical reason intern doesn't cover class instances.
Other built in immutable types are either comparable in a single machine-level operation already (int, float, etc), or immutable containers that can contain mutable values (tuple, frozenset). There's no need to intern the former, and the latter can't be safely interned either.
-
1+1. "In Python, it's not possible to enforce the immutability of user-defined class instances" -- how unfortunate. – ShreevatsaR Nov 26 '13 at 19:17
There is no technical reason that, say, a tuple could not be interned, though I would imagine that in the real world this is of little value compared to string literals, and it would be of even less real-world value with user-defined types. Making it work is probably not considered worth the effort.

- 178,883
- 35
- 278
- 309
-
5The contents of a tuple can be mutable. Interning tuples could cause weird behaviour. E.g. `a, b = ([]), ([]); a[0].append('foo')` would have different results depending on whether `a` and `b` were different tuples which depends on the tuple interning implementation. Apparently some implementations of Fortran [did something similar](http://stackoverflow.com/questions/1995113/strangest-language-feature/1995476#1995476). – Peter Graham Aug 03 '11 at 23:23
-
2@Peter: The whole idea of interning is that you can do it only when there are no side effects like the one you describe. Interning a tuple would obviously have to check to make sure all elements are the same objects as a tuple already interned, not merely equal. It could in fact try to intern the elements first (this would be more useful if numeric types were also made internable). In your example, the tuples simply would not be able to share a reference if they didn't contain the same list. There is no technical reason why Python could not be amended to do all this. – kindall Aug 04 '11 at 16:29
Only strings are supported because interning relies on a pointer-based object identity test. Hashes of other types of classes could be compared, but the objects themselves will never match an identity test. This is true because even though they may be identical, they are not the same objects.

- 28,485
- 8
- 71
- 90
-
3As I understand it, the pointer-based object identity test is the *benefit* you gain by interning, not what is required to intern things. If you're asking to intern objects, you're inherently considering any two of them with the same value to have the same identity (which I believe is what Neil G is getting at when he says "hashable and comparable"). That in turn requires that they are immutable, which is not an enforceable property of class instances in Python. I would guess that's the main theoretical reason it's not supported. – Ben Aug 03 '11 at 23:58
-
@Ben: That's a good interpretation of my question. Also, your last point is probably the answer to this question. Feel free to add an answer. – Neil G Aug 04 '11 at 03:57