1

I'm quite confused by Python objects' allocations in memory. It seems that the allocation of predefined types doesn't behave consistently. Here is the product of my cogitations on the issue:

a = None
b = None
print( a, b is a) # it outputs True, one single instance of None

a = 'a'
b = 'a'
print( a, b is a) # it outputs True, one single instance of a string

a = 2
b = 2
print( a, b is a) # it outputs True, one single instance of an int


a = 2.5
b = 2.5
print( a, b is a) # it outputs True, one single instance of a float
                  # from the python command line  'b is a' returns False

a = 'a b'
b = 'a b'
print( a, b is a) # it outputs True, one single instances of the same string
                  # from the python command line  'b is a' returns False

a = ()
b = ()
print( a, b is a) # it outputs True, one single instance of a ()

a = {}
b = {}
print( a, b is a) # it outputs False, two different instances of the same empty {}

a = []
b = []
print( a, b is a) # it outputs False, two different instances of the same []

The id return values for a and b show that the is operator is working properly but the 'memory usage optimization' algorithm seems to be working inconsistently.

Are the last two print outputs and the python command line interpreter behavior revealing some implementation bugs, or is Python supposed to behave that way?

I ran those tests in the OpenSUSE 13.1 env. with Python 2.7.6 and Python 3.3.5 (default, Mar 27 2014, 17:16:46) [GCC] on linux.

Apart from the output differences between the command line and the program, what is the reason of this type of optimization? I think it is quite optimistic to assume that programs would save more than 10% of memory on average unless we consider special cases which should be managed directly by the programmer.

Does this behavior help to effectively minimize memory fragmentation?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • See also: http://stackoverflow.com/q/15541404/3001761, http://stackoverflow.com/q/21203212/3001761 on strings specifically. – jonrsharpe Jun 24 '14 at 22:20

1 Answers1

2

The difference here is simply that some of those objects are mutable, and some immutable.

It is perfectly safe to optimise out assignments to e.g. string literals, because two references to the same immutable object won't cause any problems. As you can't change the object in-place, any change will mean a new object, separate to the old one.

However, with mutable types like lists, you could end up in trouble if you set a = b. Mutable objects can be changed in-place, so in your list example, anything you appended to a would end up in b and vice versa.

The behaviour in the interpreter is different (except for small integers, which are "interned"), as these optimisations are not being carried out:

>>> a = "a b"
>>> b = "a b"
>>> a is b
False
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    You may wish to add to your post the example [sys.intern](https://docs.python.org/3.0/library/sys.html#sys.intern) to show that if you use `a=sys.intern('a b'); b=sys.intern('a b')` Now `a is b` returns True for the interned string. – dawg Jun 25 '14 at 00:36