5

I am new to Python and trying to understand the difference between mutable and immutable objects. One of the mutable types in Python is list. Let's say L = [1,2,3], then L has a id that points the object [1,2,3]. If the content of [1,2,3] is modified then L still retains the same id. In other words L is still associated with the same object even though the size and content of the object has been altered.

With immutable objects, my understanding is that modification of the object isn't allowed. Therefore, if a variable is reassigned with a new value, then that variable is bind to a new object with a different id. I expect string to behave in similar manner. Yet I tried to modified a string but the string id didn't change.

string = "blue"
for i in range(10):
    string = string + str(i)
    print("string id after {}th iteration: {}".format(i,id(string)))


string id after 0th iteration: 46958272
string id after 1th iteration: 46958272
string id after 2th iteration: 46958272
string id after 3th iteration: 47077400
string id after 4th iteration: 47077400
string id after 5th iteration: 47077400
string id after 6th iteration: 47077400
string id after 7th iteration: 47077400
string id after 8th iteration: 47077400
string id after 9th iteration: 47077400
DYZ
  • 55,249
  • 10
  • 64
  • 93
kbl
  • 129
  • 5
  • Possible duplicate of [Mutable vs immutable objects](https://stackoverflow.com/questions/214714/mutable-vs-immutable-objects) – Devesh Kumar Singh May 09 '19 at 05:55
  • 1
    the duplicate does in no way answer the question about the `id`s of the strings... – hiro protagonist May 09 '19 at 05:57
  • if you start with an empty string; the `id` remains the same even until the 6th iteration... – hiro protagonist May 09 '19 at 06:00
  • @hiroprotagonist But then it changes and keeps changing. – DYZ May 09 '19 at 06:03
  • Aside: `string` is a poor variable name since it clashes with the `string` module name. – jpmc26 May 09 '19 at 06:08
  • I think it can be some optimization. ID may means address in memory. Some object may reserve more memory then you really need and when you append at the end then they don't have to ask system for new, bigger place in memory for all new text but they can use reserved memory to append new char. This way its address in memory doesn't change. Try `string = str(i) + string` and you get different ID in every iteration. – furas May 09 '19 at 06:09
  • Possible duplicate of [Why is the id of a Python class not unique when called quickly?](https://stackoverflow.com/questions/20753364/why-is-the-id-of-a-python-class-not-unique-when-called-quickly) See [the docs](https://docs.python.org/3/library/functions.html#id) for `id()`, too. – jpmc26 May 09 '19 at 06:17
  • this may also be related: https://github.com/satwikkansal/wtfpython#-deep-down-were-all-the-same- – hiro protagonist May 09 '19 at 06:36

1 Answers1

3

You really shouldn't see the same ID twice in a row, but CPython has an optimization for string concatenation with + that doesn't quite obey all the rules it's supposed to.

When CPython sees an operation of the form x = x + something or x += something, if x refers to a string and x holds the only reference to that string, then CPython will grow the string with realloc instead of creating a new string object. Depending on details of available memory, realloc may resize the allocated memory in place, or it may allocate new memory. If it resizes the allocation, the object's id remains the same. You can see the implementation in unicode_concatenate in Python/ceval.c.

This optimization is mostly fine, because the refcount check ensures it behaves mostly as if strings were really immutable and a new string was created. However, in x = x + stuff, the old string and the new string should have briefly overlapping lifetimes, because the new string should come into existence before the assignment ends the old string's lifetime, so it should be impossible for the ID values to be equal.

id is one of the few ways the optimization is observably different from if no string mutation occurred. The language developers seem to have decided they're okay with that.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Please provide a reference that the language guarantees that the "lifetime" of an object should last until the assignment is complete, rather than something else such as being defined at the statement level so that it may be discarded any time during the statement. – jpmc26 May 09 '19 at 09:16
  • @jpmc26: The Python documentation does not define the term "lifetime", though it does use the term. There's no sensible way to define it statement-level, though, and the closest thing to a sensible statement-level definition would still prohibit equal ID values (because the old and new strings are alive during the same statement). – user2357112 May 09 '19 at 09:33
  • While I would like an actual definition of the term (particularly for multi-threaded programs, without assuming a GIL to linearize things), it's hard to argue for any interpretation where two objects that exist at the same time are considered to have non-overlapping lifetimes, and without the optimization, the old and new strings would definitely exist simultaneously. The RHS of the assignment must be fully evaluated before name (re)binding occurs. – user2357112 May 09 '19 at 09:34
  • If your concern is whether the language guarantees that the LHS of an assignment is not discarded early, see the [assignment statement documentation](https://docs.python.org/3/reference/simple_stmts.html#assignment-statements), which says that name (re)binding only occurs once the new object is available. The string concatenation optimization cheats by unbinding the name up front. – user2357112 May 09 '19 at 09:41