When we take the sum of a list, we designate an accumulator (memo
) and then walk through the list, applying the binary function "x+y" to each element and the accumulator. Procedurally, this looks like:
def mySum(list):
memo = 0
for e in list:
memo = memo + e
return memo
This is a common pattern, and useful for things other than taking sums — we can generalize it to any binary function, which we'll supply as a parameter, and also let the caller specify an initial value. This gives us a function known as reduce
, foldl
, or inject
[1]:
def myReduce(function, list, initial):
memo = initial
for e in list:
memo = function(memo, e)
return memo
def mySum(list):
return myReduce(lambda memo, e: memo + e, list, 0)
In Python 2, reduce
was a built-in function, but in Python 3 it's been moved to the functools
module:
from functools import reduce
We can do all kinds of cool stuff with reduce
depending on the function we supply as its the first argument. If we replace "sum" with "list concatenation", and "zero" with "empty list", we get the (shallow) copy
function:
def myCopy(list):
return reduce(lambda memo, e: memo + [e], list, [])
myCopy(range(10))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If we add a transform
function as another parameter to copy
, and apply it before concatenating, we get map
:
def myMap(transform, list):
return reduce(lambda memo, e: memo + [transform(e)], list, [])
myMap(lambda x: x*2, range(10))
> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
If we add a predicate
function that takes e
as a parameter and returns a boolean, and use it to decide whether or not to concatenate, we get filter
:
def myFilter(predicate, list):
return reduce(lambda memo, e: memo + [e] if predicate(e) else memo, list, [])
myFilter(lambda x: x%2==0, range(10))
> [0, 2, 4, 6, 8]
map
and filter
are sort of unfancy ways of writing list comprehensions — we could also have said [x*2 for x in range(10)]
or [x for x in range(10) if x%2==0]
. There's no corresponding list comprehension syntax for reduce
, because reduce
isn't required to return a list at all (as we saw with sum
, earlier, which Python also happens to offer as a built-in function).
It turns out that for computing a running sum, the list-building abilities of reduce
are exactly what we want, and probably the most elegant way to solve this problem, despite its reputation (along with lambda
) as something of an un-pythonic shibboleth. The version of reduce
that leaves behind copies of its old values as it runs is called reductions
or scanl
[1], and it looks like this:
def reductions(function, list, initial):
return reduce(lambda memo, e: memo + [function(memo[-1], e)], list, [initial])
So equipped, we can now define:
def running_sum(list):
first, rest = list[0], list[1:]
return reductions(lambda memo, e: memo + e, rest, first)
running_sum(range(10))
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
While conceptually elegant, this precise approach fares poorly in practice with Python. Because Python's list.append()
mutates a list in place but doesn't return it, we can't use it effectively in a lambda, and have to use the +
operator instead. This constructs a whole new list, which takes time proportional to the length of the accumulated list so far (that is, an O(n) operation). Since we're already inside the O(n) for
loop of reduce
when we do this, the overall time complexity compounds to O(n2).
In a language like Ruby[2], where array.push e
returns the mutated array
, the equivalent runs in O(n) time:
class Array
def reductions(initial, &proc)
self.reduce [initial] do |memo, e|
memo.push proc.call(memo.last, e)
end
end
end
def running_sum(enumerable)
first, rest = enumerable.first, enumerable.drop(1)
rest.reductions(first, &:+)
end
running_sum (0...10)
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
same in JavaScript[2], whose array.push(e)
returns e
(not array
), but whose anonymous functions allow us to include multiple statements, which we can use to separately specify a return value:
function reductions(array, callback, initial) {
return array.reduce(function(memo, e) {
memo.push(callback(memo[memo.length - 1], e));
return memo;
}, [initial]);
}
function runningSum(array) {
var first = array[0], rest = array.slice(1);
return reductions(rest, function(memo, e) {
return x + y;
}, first);
}
function range(start, end) {
return(Array.apply(null, Array(end-start)).map(function(e, i) {
return start + i;
}
}
runningSum(range(0, 10));
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
So, how can we solve this while retaining the conceptual simplicity of a reductions
function that we just pass lambda x, y: x + y
to in order to create the running sum function? Let's rewrite reductions
procedurally. We can fix the accidentally quadratic problem, and while we're at it, pre-allocate the result list to avoid heap thrashing[3]:
def reductions(function, list, initial):
result = [None] * len(list)
result[0] = initial
for i in range(len(list)):
result[i] = function(result[i-1], list[i])
return result
def running_sum(list):
first, rest = list[0], list[1:]
return reductions(lambda memo, e: memo + e, rest, first)
running_sum(range(0,10))
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
This is the sweet spot for me: O(n) performance, and the optimized procedural code is tucked away under a meaningful name where it can be re-used the next time you need to write a function that accumulates intermediate values into a list.
- The names
reduce
/reductions
come from the LISP tradition, foldl
/scanl
from the ML tradition, and inject
from the Smalltalk tradition.
- Python's
List
and Ruby's Array
are both implementations of an automatically resizing data structure known as a "dynamic array" (or std::vector
in C++). JavaScript's Array
is a little more baroque, but behaves identically provided you don't assign to out of bounds indices or mutate Array.length
.
- The dynamic array that forms the backing store of the list in the Python runtime will resize itself every time the list's length crosses a power of two. Resizing a list means allocating a new list on the heap of twice the size of the old one, copying the contents of the old list into the new one, and returning the old list's memory to the system. This is an O(n) operation, but because it happens less and less frequently as the list grows larger and larger, the time complexity of appending to a list works out to O(1) in the average case. However, the "hole" left by the old list can sometimes be difficult to recycle, depending on its position in the heap. Even with garbage collection and a robust memory allocator, pre-allocating an array of known size can save the underlying systems some work. In an embedded environment without the benefit of an OS, this kind of micro-management becomes very important.