9

The past days I've been trying get a better understanding of computational complexity and how to improve Python code. For this I have tried out different functions for calculating Fibonacci numbers, comparing how long the script runs if I make small changes.

I'm calculating Fibonacci numbers using a list, adding the sum of element -2 and -1 from the list.

I was puzzled to find that if I add a .pop() in the loop, deleting not needed elements of my list, my script runs significantly faster. I don't see why this is. Each step in the loop the computer does one more thing. So my untrained intuition suggests that this should increase computational time. Is 'looking up' the last element of the list so much slower when the list is very long?

Here is my code:

import time
import numpy as np

def fib_stack1(n):
    """ Original function """
    assert type(n) is int, 'Expected an integer as input.'
    if n < 2:
        return n
    else:
        stack = [0, 1]
        for i in range(n-1):
            stack.append(stack[-1] + stack[-2])
        return stack[-1]

def fib_stack2(n):
    """ Modified function """
    assert type(n) is int, 'Expected an integer as input.'
    if n < 2:
        return n
    else:
        stack = [0, 1]
        for i in range(n-1):
            stack.append(stack[-1] + stack[-2])
            ### CHANGE ###
            stack.pop(-3)
            ##############
        return stack[-1] 


rec1 = []
rec2 = []
for _ in range(10):
    t1 = time.time()
    fib_stack1(99999)  
    t2 = time.time()
    rec1.append(t2-t1)
    t1 = time.time()
    fib_stack2(99999)  
    t2 = time.time()
    rec2.append(t2-t1)
print(np.array(rec1).mean())
print(np.array(rec2).mean())

The output is the following:

# Original 
0.26878631115
# Modified
0.145034956932
Jérôme Bau
  • 707
  • 5
  • 16

4 Answers4

6

A list stores its elements in memory in a contiguous way.

So the append method of the list object needs to resize the allocated memory block from time to time (not every time append is called, fortunately)

Sometimes, the system is able to resize "in-place" (allocates further memory just after the current memory block), and sometimes not: it has to find a contiguous block of memory big enough to store the new list.

When the resize is not "in-place", existing data needs to be copied. (Note that doesn't happen when the size of the list decreases)

So if there are less elements in the list when it's copied, operations are faster.

Note that list.append remains extremely fast. Adding at the end of a list is the fastest way (compared to insert which has to move elements each time to free its "slot")

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
6

Is 'looking up' the last element of the list so much slower when the list is very long?

No, list length has no effect on lookup speed. These are arraylists, not linked lists. It's more likely that this is related to memory allocation or cache performance. The garbage collector is also involved.

When you delete unneeded list elements, Python never has to allocate a bigger buffer for the list. It may also be able to reuse the memory allocated for the int objects instead of requesting more memory from the OS. Considering how huge your integers get, reusing their memory is a big deal. (The details of memory allocation depend on the Python version and the underlying standard library allocator. Python 2 has a free list for ints, but not longs; Python 3 has no free list for ints. Python itself makes no effort to reuse allocations for large objects, but the underlying allocator might be doing something.)

Additionally, when you have to keep allocating new integers, especially ones as huge as the 99999th Fibonacci number, you're not going to get much benefit out of your CPU's cache. Main memory access is much slower than cache.

Finally, the allocation pattern of your fib_stack1 (lots of allocations, not so many object refcounts falling to 0) triggers Python's cycle-detector system, a.k.a. the garbage collector, which takes time to run and touches a lot of memory that didn't need touching, hurting cache performance. Temporarily disabling the collector produces a notable speedup for fib_stack1 in my own tests, especially on Python 3.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Using as an opportunity for a redirect. The answer by @jean prompted me to start researching contiguous memory for python lists, which was surprising for me considering how much emphasis `numpy` puts on this for speed (implying that it's part of the reason it's faster than Python), and I've seen you pop up a few times in my search. I assume the contiguous issue then applies in cases on multi-dimensional lists/arrays where they start to differ? Do you know of any good answers to direct my search to understand this? – roganjosh Aug 11 '17 at 21:22
  • 2
    @roganjosh: I'm not actually sure what you're asking. Lists are noncontiguous in the sense that they store pointers to their elements, but the pointers are stored contiguously. NumPy arrays aren't always contiguous, particularly arrays that are views of other data, but unlike lists, they don't store an array of pointers (unless they're `object` arrays). – user2357112 Aug 11 '17 at 21:32
  • 1
    @roganjosh: You might be interested in the [NumPy internals documentation](https://docs.scipy.org/doc/numpy/reference/internals.html) and the [list object implementation](https://github.com/python/cpython/blob/master/Objects/listobject.c). If you have plenty of free time, you might be interested in reading some of the [NumPy core implementation](https://github.com/numpy/numpy/tree/master/numpy/core/src). – user2357112 Aug 11 '17 at 21:36
  • And that immediately highlights the limitation of my thought process, thanks. I was thinking along the lines of, say, a 2D list of ints of mismatched dimensions, in which case I wasn't seeing why memory allocation should be different between numpy (which then wouldn't be contiguous) and the python list – roganjosh Aug 11 '17 at 21:36
  • It went back to a previous discussion in comments in which I wasn't sure why a 3D array containing a square 2D array couldn't be contiguous for the 2D array (I was trying to store a matrix and metadata about that matrix in a single array). You suggested that I store metadata and the square matrix separately. I don't know C so it's tough for me to truly get a view of what's happening, but your first comment shows my oversight. Thanks again. – roganjosh Aug 11 '17 at 21:45
3

No, looking up any element in a list is done in the same amount of time (what is known as constant time behavior in computer science). Adding the call to pop does increase the work needed in each loop iteration a little, but the list never gets bigger than 3 elements. In your first version, the list grows in each iteration, and such an operation can either be completely free or quite expensive, depending on how much additional memory the list has actually allocated under the hood, an information which is not directly accessible.

Basically, when you instantiate a list, some additional space is preallocated, making room for future appends on the expense of "wasting" space. If the list gets filled up, it needs to be enlarged for further appends to happen, and so these particular appends are much more expensive than usually. If some other data is already present in memory at the end of the array, all of the data (actually just pointers) in the list elements has to be copied over to a new memory location where the entire, new list can be stored in one, contiguous chunk of memory.

For more information about list growth behavior (in CPython only, as this is implementation specific), see e.g. here

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
-1

The overhead of maintaining the list outweighs the additional operation. This can be proven by inspecting the size of the resulting list as the functions run:

# Func      List Size (Sum of Elements Size)
fib_stack1: 120 (72)
fib_stack1: 90136 (4888568)
fib_stack1: 162720 (19034164)
fib_stack1: 260864 (42436332)
fib_stack1: 330256 (75095060)
fib_stack1: 418080 (117010332)
fib_stack1: 529224 (168182184)
fib_stack1: 595424 (228610568)
fib_stack1: 669896 (298295536)
fib_stack1: 753680 (377237048)

fib_stack2: 112 (48)
fib_stack2: 112 (1904)
fib_stack2: 112 (3752)
fib_stack2: 112 (5608)
fib_stack2: 112 (7456)
fib_stack2: 112 (9312)
fib_stack2: 112 (11160)
fib_stack2: 112 (13008)
fib_stack2: 112 (14864)
fib_stack2: 112 (16712)

fib_stack3: 48
fib_stack3: 1904
fib_stack3: 3752
fib_stack3: 5608
fib_stack3: 7456
fib_stack3: 9312
fib_stack3: 11160
fib_stack3: 13008
fib_stack3: 14864
fib_stack3: 16712

NOTE:

  • In the first case, python doesn't know you don't need the old values so it has to index and store all previous values, to include, as others have mentioned, copying over that list as it needs to allocate more memory.
  • In the second case, python doesn't have to store the old values, so the list is not growing since add/pop every time. It removes item zero, copies all larger items back one space, then continues.
  • In the third case (described below), we remove the list intermediate and just manually apply the transform. This is why the sum of element size for fib_stack2 is equivalent to the two integers combined in fib_stack3.

Further improvements can be had by not making the list to begin with:

def fib_stack3(n):
    """ Modified function """
    assert type(n) is int, 'Expected an integer as input.'
    if n < 2:
        return n
    else:
        second_anc, first_anc = 0, 1
        for i in range(n-1):
            second_anc, first_anc = first_anc, second_anc + first_anc
        return first_anc 

Running all three on repl.it gives:

fib_stack1 1.3875333309173583
fib_stack2 0.41049718856811523
fib_stack3 0.33348443508148196

Clearly the last case, where we don't bother with a list at all, wins out.

This is because we're no longer inching our list down, we're just reusing the same two integers. This works because python evaluates the right side before the left and therefore allows single line swaps.

TemporalWolf
  • 7,727
  • 1
  • 30
  • 50