5

Being mutable, when a Python list is extended (e.g., mylist.extend() or mylist += anotherlist), the id of the list does not change.

I understand that (at least in CPython) lists are contiguous in memory (and the id happens to be the address of the head of the list). What if the memory after the list is already highly fragmented and the list extension cannot be allocated (even though there's plenty of free space, albeit discontiguous in that area)? Does the allocation fail? How is that mitigated?

MartyMacGyver
  • 9,483
  • 11
  • 47
  • 67
  • 1
    https://github.com/python/cpython/blob/master/Objects/listobject.c – Padraic Cunningham Mar 29 '16 at 19:23
  • The indirect reference via a PyObject makes perfect sense... it just threw me when I was reading a thread about extend vs += I saw a mention of list ids being *direct* pointers to the head of the list, so I ended up asking this. – MartyMacGyver Mar 29 '16 at 20:49

2 Answers2

2

In CPython, this is a difference in the way lists and tuples are allocated. For a list, the object contains a pointer to the memory allocated for the list contents. The list object proper is small, and never needs to move; the address of the vector it points to may change any number of times.

For a tuple object, which is expected to be small most times, the memory for the tuple content is indeed allocated directly in the tuple object. But tuples can't be resized, so your scenario can't arise in that case.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • Ok, that answers that then - I wondered if the pointer was to the actual head or had one level of indirection as you described. In hindsight it looks pretty obvious but I really appreciate the response! (In reading a thread about extend vs += I saw a mention of list ids being direct pointers to the head of the list... that's why I ended up asking this.) – MartyMacGyver Mar 29 '16 at 20:45
  • It was a good question! The same thing applies, of course, to other "potentially large mutable objects", like dicts and sets: one allocation for a fixed-size object header (whose "id" - address - never changes), and at least one other allocation for the variable-size part, which the header points _at_. – Tim Peters Mar 29 '16 at 21:15
  • Well, it was more of a CPython thing... it makes perfect sense that one would do it this way, I just mis-read a note I'd found and it threw me off. I'd delete the original question, but perhaps it'll be of use to a newbie. I wonder if i'll have to wear the pocket-protector of shame at PyCon? :-D – MartyMacGyver Mar 29 '16 at 22:47
1

This is implementation specific, but I assume you are asking about CPython.

As you say, lists are contiguous memory, but these are C pointers to PyObjects (*PyObject). The objects themselves, be they integers or objects of your own class, are allocated somewhere in dynamic memory but (probably) not in contiguous memory.

When a list is first allocated then more memory is obtained than is required, leaving "slack" entries on (conceptually) the right-hand side. A list.append() will just use the next slack entry. Of course eventually they are all full.

At that point a reallocation of memory is performed. In C terms that is a call to realloc(), but the implementation might not actually use the C run-time heap for this. Basically, a larger memory allocation is obtained from dynamic memory, then all the elements of the list are copied to the new list. Remember though that we are copying pointers to Python objects, not the objects themselves.

There is an overhead in doing this, but the use of the slack entries reduces it. You can see from this that list.append() is more efficient than adding an entry on the "left" of the list.

If the allocation fails because there is insufficient memory for a new list, then you will get an out of memory error. But if you are that short of memory then any new Python object could cause this.

Heap fragmentation can be an issue, but Python does its best to manage this, which is one reason why Python does some heap management of its own. In recent Windows implementations the low-level virtual memory APIs are used instead of the C RTL so Python can perform its own memory management.

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • Yes, this concerned CPython and I understand it now. – MartyMacGyver Mar 29 '16 at 20:46
  • It was a good question. If you know the implementation then you know why you should use `append()` in preference to `extend()`. – cdarke Mar 30 '16 at 06:37
  • I'm not sure why that is... `extend()` and `append()` don't seem that different in the implementation (CPython code referenced in @Padric's comment above). Further, per the docs and the long discussion at http://stackoverflow.com/questions/252703/python-append-vs-extend the big difference (looking at the code) is that `extend()` iterates - which would save multiple manual calls to `append()`. Between that and the optimizations in the code, it seems `extend()` would be generally faster when adding the elements from one list to another. Thoughts? – MartyMacGyver Mar 30 '16 at 07:18
  • Its a question of numbers. For many elements then `extend` is more efficient, you are right. But it should not be used for single elements - something I have seen. There is a discussion here which might be of interest http://stackoverflow.com/questions/252703/python-append-vs-extend – cdarke Mar 30 '16 at 07:28
  • I see your point now and agree (that's the same discussion I referenced in my reply above so that's why I asked). – MartyMacGyver Mar 30 '16 at 07:41