1

I know that when you append into a list in python, the amount of allocated memory for the list will slowly grow. As you reach the limit of the preallocated portion, new memory will be allocated and the existing items will be copied over.

What happens when you cast an iterator to a list? Does it perform this same process or is it more intelligent?

What about you are casting something that implements a __len__ method to a list, does it allocated obj.__len__() memory slots to reduce the amount of copying during the casting process?

In summary, is there any difference between the following three snippets in terms of how the memory for the list list_of_interest is allocated?

Standard append

list_of_interest = []
for i in range(1000):
    list_of_interest.append(i)

Casting an iterator

list_of_interest = list(range(1000))

Casting an object with __len__

b = tuple(range(1000))
list_of_interest = list(b)

Thank you for your help!

D Hudson
  • 1,004
  • 5
  • 12
  • 1
    This isn't a cast (at least, not in any sense that I consider the term should cover). `list` consumes the iterator it is given and produces a new instance of `list`. – chepner Aug 03 '21 at 13:03
  • 2
    But IIRC, `list` will try to immediately allocate enough space for the list up front if `__len__` can be used to provide that information, rather than appending elements one at a time to an initially empty list. (Note that `range` *also* defines `len`, since it can compute exactly how many integers are in the half-open interval [0, 1000).) – chepner Aug 03 '21 at 13:05
  • 1
    Types can also provide a `__length_hint__` method that can provide an estimated size, which `list` could use the same way to at least pre-allocate *some* memory that will be expected to be used. – chepner Aug 03 '21 at 13:20
  • 1
    Maybe test a bit like [this](https://stackoverflow.com/q/60549865/12671057). – Kelly Bundy Aug 03 '21 at 13:33

2 Answers2

1

When __len__ is defined, list (at least, in CPython; other implementations may differ) will use the iterable's reported size to preallocate an array exactly large enough to hold all the iterable's elements.

Your 2nd and 3rd examples are actually identical, because range does provide __len__ (as it's trivial to compute the number of integers in a range with defined end points).

An example of an iterable that cannot define __len__ is filter. Even if an instance of filter knows how big its iterable argument is, it won't know how many of them it will yield without actually iterating over it. So something like list(filter(odd, range(1000))) would have to start with an empty list and append to it as filter returns odd integers.

There is a __length_hint__ method that containers can define to provide at least an estimate as to how many items they contain. I don't seen any evidence in listobject.c, though, that list will make use of it to preallocate its array.

chepner
  • 497,756
  • 71
  • 530
  • 681
1

We can get a bit more insight into what's going on by disassembling the each of the code examples with the dis module.

Standard Append:

dis.dis("""
list_of_interest = []
for i in range(1000):
    list_of_interest.append(i)
""")

gives us

  2           0 BUILD_LIST               0
              2 STORE_NAME               0 (list_of_interest)

  3           4 SETUP_LOOP              26 (to 32)
              6 LOAD_NAME                1 (range)
              8 LOAD_CONST               0 (1000)
             10 CALL_FUNCTION            1
             12 GET_ITER
        >>   14 FOR_ITER                14 (to 30)
             16 STORE_NAME               2 (i)

  4          18 LOAD_NAME                0 (list_of_interest)
             20 LOAD_METHOD              3 (append)
             22 LOAD_NAME                2 (i)
             24 CALL_METHOD              1
             26 POP_TOP
             28 JUMP_ABSOLUTE           14
        >>   30 POP_BLOCK
        >>   32 LOAD_CONST               1 (None)
             34 RETURN_VALUE

Looking closely we see were are calling the append method in a loop (just like we wrote). Digging into the implementation (here) we see that this eventually calls app1 which calls list_resize which over allocates to try to reduce reallocations.

Other Two Examples:

As noted in other answers range actually does implement a __len__ making

list_of_interest = list(range(1000))

and

b = tuple(range(1000))
list_of_interest = list(b)

work pretty similarly.

Disassembling the first gives:

  1           0 LOAD_NAME                0 (list)
              2 LOAD_NAME                1 (range)
              4 LOAD_CONST               0 (1000)
              6 CALL_FUNCTION            1
              8 CALL_FUNCTION            1
             10 STORE_NAME               2 (list_of_interest)
             12 LOAD_CONST               1 (None)
             14 RETURN_VALUE

and the second:

  2           0 LOAD_NAME                0 (tuple)
              2 LOAD_NAME                1 (range)
              4 LOAD_CONST               0 (1000)
              6 CALL_FUNCTION            1
              8 CALL_FUNCTION            1
             10 STORE_NAME               2 (b)

  3          12 LOAD_NAME                3 (list)
             14 LOAD_NAME                2 (b)
             16 CALL_FUNCTION            1
             18 STORE_NAME               4 (list_of_interest)
             20 LOAD_CONST               1 (None)
             22 RETURN_VALUE

In both cases we call the list constructor defined here. We see that if the provided iterable to list has a length, then we call list_preallocate_exact which acts how you expect. Otherwise we call list_extend on an empty list. In the case that our iterator is not a list or tuple (which in this case we know it isn't), we'll call PyObject_LengthHint to guess the length of the iterator and preallocate space that way. Since the length hint might be too large or too small, we may need to allocate more memory as we loop through the iterator in the standard way and we may deallocate space after we're done looping in the case that the hint was too large.

Other situations:

We can play around with other examples, but we see that we'll hit the list constructor above which has several special cases. Often we'll next go to list_extend which also has several special cases. In the case where our iterator doesn't implement length or length hint, we default to a length hint of 8. Why 8? You'd have to ask the maintainers that one.

Conclusion:

If Python can know the needed length of the resulting list from the start, you'll only have one allocation. If it has to guess, you may have more than one or a deallocation. If it can't guess in the case of appending in a loop or being provided an iterator with no length or length hint, you'll have multiple allocations.

Kyle Parsons
  • 1,475
  • 6
  • 14