21

I have a list my_list (the list contains utf8 strings):

>>> len(my_list)
8777
>>> getsizeof(my_list)                     #  <-- note the size
77848

For some reason, a sorted list (my_sorted_list = sorted(my_list)) uses more memory :

>>> len(my_sorted_list)
8777
>>> getsizeof(my_sorted_list)              #  <-- note the size
79104

Why is sorted returning a list that takes more space in memory than the initial unsorted list?

jcuenod
  • 55,835
  • 14
  • 65
  • 102
  • 1
    As [@Jim's answer](http://stackoverflow.com/a/40317620/3124746) pointed out that `sorted` creates new list you can follow story with [my recent question(list() uses more memory than list comprehension)](http://stackoverflow.com/questions/40018398/list-uses-more-memory-than-list-comprehension) it would give you some python insights. – vishes_shell Nov 01 '16 at 19:45
  • 1
    @vishes_shell Or also https://stackoverflow.com/questions/7247298/size-of-list-in-memory (asked in 2011) – jcuenod Nov 01 '16 at 22:17
  • 1
    I really liked the charts in your answer to your question @vishes_shell :-). The only issue I see with these sort of questions and answers is that they might *suddenly* become obsolete at some point because we're dealing with an implementation detail :( – Dimitris Fasarakis Hilliard Nov 02 '16 at 06:44
  • 1
    @JimFasarakis-Hilliard that's true, although this particular implementation detail has survived the move from python2 to python3. – jcuenod Nov 02 '16 at 20:33
  • 1
    @jcuenod True, true. Just take a look at the `dict` though, it's implementation survived for long until [it got a pretty big change in `3.6`](http://stackoverflow.com/questions/39980323/dictionaries-are-ordered-in-python-3-6) ;-). – Dimitris Fasarakis Hilliard Nov 02 '16 at 20:56

2 Answers2

18

As Ignacio points out, this is due to Python allocating a bit more memory than required. This is done in order to perform O(1) .appends on lists.

sorted creates a new list out of the sequence provided, sorts it in place and returns it. To create the new list, Python extends an empty sized list with the one passed; that results in the observed over-allocation (which happens after calling list_resize). You can corroborate the fact that sorting isn't the culprit by using list.sort; the same algorithm is used without a new list getting created (or, as it's known, it's performed in-place). The sizes there, of course, don't differ.

It is worth noting that this difference is mostly present when:

So, with a list-comp:

l = [i for i in range(10)]

getsizeof(l)          # 192
getsizeof(sorted(l))  # 200

Or a list literal:

l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

getsizeof(l)          # 144
getsizeof(sorted(l))  # 200

the sizes are smaller (more so with use of literals).

When creating through list, the memory is always over-allocated; Python knows the sizes and preempts future modifications by over-allocating a bit based on the size:

l = list(range(10))

getsizeof(l)          # 200
getsizeof(sorted(l))  # 200

So you get no observed difference in the sizes of the list(s).


As a final note, I must point out that this is behavior specific the C implementation of Python i.e CPython. It's a detail of how the language was implemented and as such, you shouldn't depend on it in any wacky way.

Jython, IronPython, PyPy and any other implementation might/might not have the same behavior.

Community
  • 1
  • 1
Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
13

The list resize operation overallocates in order to amortize appending to the list versus starting with a list preallocated by the compiler.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    Thanks for the link to Python's code! That's awesome... I would have accepted but the detail in @Jim's answer won me over. – jcuenod Oct 29 '16 at 18:19