Eliminate for-loop in Python and Numpy construct

Question

I'm looking for a method in Python and/or Numpy vectorization to eliminate use of a for-loop for the following:

for i in list_range_values:
    v[list_list_values[i]] += list_comp_values[i]

where:

list_range_values is a Python list of integer values eg. [1, 3, 5], drawn from the range(0, R-1, 1)
list_comp_values is a Python list of numeric values eg. [0.7, 9.8, 1.2, 5, 10, 11.7, 6, 0.2] such that len(list_comp_values) = R
v is a numpy vector of length V such that R can be <, =, > than V
list_list_values is a Python list of lists (each list containing a different number of integer values eg. [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45], [4, 7, 8], [0, 1], [21, 31, 41], [9, 11, 22, 33, 44], [17, 19]]) drawn from the range(0, V-1, 1) and with len(list_list_values) = R

Eg.

for i in list_range_values (= [1, 3, 5]):
    i=1: v[[5, 7, 11, 25, 99]] += list_comp_values[1] (= 9.8)
    i=3: v[[4, 7, 8]] += list_comp_values[3] (= 5)
    i=5: v[[21, 31, 41]] += list_comp_values[5] (= 11.7)

Is there a method available that allows eliminating the for-loop?

Cython, Scipy/Weave/Blitz and a C module are alternative solutions but want to make sure if there is a Numpy vectorization answer first.

This is re-formulation of previous question at http://stackoverflow.com/questions/8695806/python-list-comprehension-for-numpy which will be deleted later today. — Henry Thornton, Jan 02 '12 at 13:05
Why do you want to get rid of the loop? It seems very appropriate to me. — Niklas B., Jan 02 '12 at 14:16
I had figured out a way for your previous question (using `numpy.bincount`), but for this one, where you need arbitrary weights for each set if `list_list_values`, I don't think there's a way — Ricardo Cárdenes, Jan 02 '12 at 14:19
@Niklas Baumstark. The for-loop is appropriate but not for delivering performance when the data sizes of the lists and lists of lists are millions of items long. — Henry Thornton, Jan 02 '12 at 14:25
@dbv: I see. Maybe it would be easiest to write that loop in C if numpy does not provide the needed functionality (which I don't know). — Niklas B., Jan 02 '12 at 14:27
Are list_comp_values and list_list_values supposed to be the same length? — jterrace, Jan 02 '12 at 16:20
@jterrace. list_comp_values and list_list_values can be of different lengths. list_range_values is drawn from the integer range(0, R-1, 1) and len(list_comp_values) = R. — Henry Thornton, Jan 02 '12 at 16:37
@dbv You are selecting the two of them with the same ``i``, so it makes no sense for one to be longer than the other. — jterrace, Jan 02 '12 at 16:46
@jterrace. Apologies, you're correct. Question has been edited to include this information. — Henry Thornton, Jan 02 '12 at 16:53
It's too bad no one wrote up an answer using `numpy.bincount`, I've used it recently for some stuff and it's much faster than I realized. — Bi Rico, Feb 29 '12 at 21:09

JoshAdel · Accepted Answer · 2012-01-02T20:07:59.337

While it often results in a massive speedup to eliminate for loops and take advantage of numpy built-ins/vectorization. I would just point out that it is not always the case. Timing the simple for loop vs the much more involved vectorization, does not give you a large speedup and is much more verbose. Just something to consider:

from timeit import Timer

setstr="""import numpy as np
import itertools
import random

Nlists = 1000
total_lists = 5000
outsz = 100
maxsublistsz = 100


# create random list of lists
list_range_values = random.sample(xrange(total_lists),Nlists)
list_list_values = [random.sample(xrange(outsz),np.random.randint(1,maxsublistsz)) for k in xrange(total_lists)]

list_comp_values = 10*np.random.uniform(size=(total_lists,))

v = np.zeros((outsz,))

def indices(start, end):
    lens = end - start
    np.cumsum(lens, out=lens)
    i = np.ones(lens[-1], dtype=int)
    i[0] = start[0]
    i[lens[:-1]] += start[1:]
    i[lens[:-1]] -= end[:-1]
    np.cumsum(i, out=i)
    return i

def sum_by_group(values, groups):
    order = np.argsort(groups)
    groups = groups[order]
    values = values[order]
    values.cumsum(out=values)
    index = np.ones(len(groups), 'bool')
    index[:-1] = groups[1:] != groups[:-1]
    values = values[index]
    groups = groups[index]
    values[1:] = np.diff(values)
    return values, groups


"""

method1="""
list_list_lens = np.array(map(len, list_list_values))
comp_vals_expanded = np.repeat(list_comp_values, list_list_lens)

list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values),dtype=int)
list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1]))

toadd = indices(list_list_starts[list_range_values],(list_list_starts + list_list_lens)[list_range_values])

v[list_vals_flat[toadd]] += comp_vals_expanded[toadd]
"""

method2="""
for k in list_range_values:
    v[list_list_values[k]] += list_comp_values[k]

"""

method3="""
llv = [list_list_values[i] for i in list_range_values]
lcv = [list_comp_values[i] for i in list_range_values]
counts = map(len, llv)
indices = np.concatenate(llv)
values = np.repeat(lcv, counts)

totals, indices_unique = sum_by_group(values, indices)
v[indices_unique] += totals
"""


t1 = Timer(method1,setup=setstr).timeit(100)
print t1

t2 = Timer(method2,setup=setstr).timeit(100)
print t2

t3 = Timer(method3,setup=setstr).timeit(100)
print t3

For a pretty large number of elements in the list:

Method1: (no for loop -jterrace) 1.43 seconds

Method2: (for loop) 4.62 seconds

Method3: (no for loop - bago) 2.99 seconds

For a small number of lists (change Nlists to 10), the for loop is significantly faster than jterrace's solution:

Method1: (no for loop -jterrace) 1.05 seconds

Method2: (for loop) 0.045 seconds

Method3: (no for loop - bago) 0.041 seconds

This is not to knock @jterrace or @bago's solutions, which are quite elegant. Rather it is to point out that often times a simple for loop does not perform that poorly.

Thank-you so much for your timings as it was exactly what I was wondering. The motivation for my question was that our lists and lists-of-lists are very large (in the millions) and will get larger over time. We use Numpy vectorization extensively and this was the only for-loop remaining in the system. — Henry Thornton, Jan 02 '12 at 19:24
Very nice, but ``import itertools`` should be put in the setup, not the method. 2.89 seconds for Nlists=10 seems fishy. — jterrace, Jan 02 '12 at 19:52
Also, if you change ``map`` to ``itertools.imap``, it might make some difference. — jterrace, Jan 02 '12 at 19:55
@JoshAdel, nice. I'd be curious to see how my [`numpy.histogram`](http://stackoverflow.com/a/8704819/577088) solution stacks up to the above... — senderle, Jan 02 '12 at 20:03
@dbv if the number is in the millions, vectorization might provide significant speedup if you don't run into memory issues. I also think taking a look at cython is a good idea. I've personally had a lot of success with it in the past for getting large speed-ups for things that either don't vectorize well or would have required a large intermediate array that didn't fit into memory. — JoshAdel, Jan 02 '12 at 20:04
@jterrace, my mistake. I've removed the import in the method and re-ran the tests. I've update the results — JoshAdel, Jan 02 '12 at 20:06
Timing of Bago's method versus for-loop on live data showed a marginal difference (~5%) in favor of Bago. Converting lists in list-of-lists to list of Numpy ndarrays improved timing by ~10%. As JoshAdel mentioned, this construct doesn't vectorize well and Cython is probably the best bet. — Henry Thornton, Jan 03 '12 at 08:23
@dbv, you're right, if you have to convert everything to ndarrays it's often not worth it. Is it possible for you to generate the list of lists as a list of ndarrays? That way you avoid the conversion, which is most of the time in both my method and the histogram method? Depending on where the lists come from, you might also gain a little bit on the front end by generating them as ndarrays to begin with. — Bi Rico, Jan 03 '12 at 15:16
@Bago. I used i) original list-of-lists (time = 100%); ii) converted the list-of-lists to list of ndarrays (time difference ~5% less); and iii) generated the lists as ndarrays to begin with (time difference ~10% less). — Henry Thornton, Jan 03 '12 at 18:28
Well, we did what we could. It sounds like you have a lot of profiling and maybe some C/Cython in you future. I'll leave you to get to it. — Bi Rico, Jan 03 '12 at 18:37
In this particular case, a simple for-loop delivered best performance. — Henry Thornton, Apr 06 '12 at 14:21

senderle · Answer 2 · 2012-01-03T14:48:11.463

Using your example input:

>>> list_list_values = [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45], [4, 7, 8], 
                        [0, 1], [21, 31, 41], [9, 11, 22, 33, 44], [17, 19]]
>>> list_comp_values = [0.7, 9.8, 1.2, 5, 10, 11.7, 6, 0.2]
>>> list_range_values = [1, 3, 5]

First, some generator shenanigans:

>>> indices_weights = ((list_list_values[i], list_comp_values[i]) 
                       for i in list_range_values)
>>> flat_indices_weights = ((i, weight) for indices, weight in indices_weights 
                             for i in indices)

Now we pass the data to numpy. I can't figure out how to produce a rec.array from an iterator, so I had to convert the above generator to a list. Maybe there's a way to avoid that...

>>> i_w = numpy.rec.array(list(flat_indices_weights),       
                          dtype=[('i', int), ('weight', float)])
>>> numpy.histogram(i_w['i'], bins=range(0, 100), weights=i_w['weight'])
(array([  0. ,   0. ,   0. ,   0. ,   5. ,   9.8,   0. ,  14.8,   5. ,
         0. ,   0. ,   9.8,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,  11.7,   0. ,   0. ,   0. ,   9.8,   0. ,
         0. ,   0. ,   0. ,   0. ,  11.7,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,  11.7,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   9.8]), 
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]))

I had a moment to follow up on JoshAdel's tests with a couple of my own. The fastest solution so far uses Bago's set-up but replaces the sum_by_group function with the built-in histogram function. Here are the numbers I got (updated):

Method1 (jterrace) : 2.65

Method2 (for loop) : 2.25

Method3 (Bago) : 1.14

Method4 (histogram) : 2.82

Method5 (3/4 combo) : 1.07

Note that as implemented here, the first method gives incorrect results according to my test. I didn't have the time to figure out what the problem is. The code for my test is below; it only gently adjusts JoshAdel's original code, but I post it here in full for convenience. (Updated to include Bago's comments and somewhat de-kludged.)

from timeit import Timer

setstr="""import numpy as np
import itertools
import random

Nlists = 1000
total_lists = 5000
outsz = 100
maxsublistsz = 100

# create random list of lists
list_range_values = random.sample(xrange(total_lists),Nlists)
list_list_values = [random.sample(xrange(outsz),np.random.randint(1,maxsublistsz)) for k in xrange(total_lists)]

list_comp_values = list(10*np.random.uniform(size=(total_lists,)))

v = np.zeros((outsz,))

def indices(start, end):
    lens = end - start
    np.cumsum(lens, out=lens)
    i = np.ones(lens[-1], dtype=int)
    i[0] = start[0]
    i[lens[:-1]] += start[1:]
    i[lens[:-1]] -= end[:-1]
    np.cumsum(i, out=i)
    return i

def sum_by_group(values, groups):
    order = np.argsort(groups)
    groups = groups[order]
    values = values[order]
    values.cumsum(out=values)
    index = np.ones(len(groups), 'bool')
    index[:-1] = groups[1:] != groups[:-1]
    values = values[index]
    groups = groups[index]
    values[1:] = np.diff(values)
    return values, groups


"""

setstr_test = setstr + "\nprint_v = True\n"

method1="""
list_list_lens = np.array(map(len, list_list_values))
comp_vals_expanded = np.repeat(list_comp_values, list_list_lens)

list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values),dtype=int)
list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1]))

toadd = indices(list_list_starts[list_range_values],(list_list_starts + list_list_lens)[list_range_values])

v[list_vals_flat[toadd]] += comp_vals_expanded[toadd]
"""

method2="""
for k in list_range_values:
    v[list_list_values[k]] += list_comp_values[k]
"""

method3="""
llv = [np.fromiter(list_list_values[i], 'int') for i in list_range_values]
lcv = [list_comp_values[i] for i in list_range_values]
counts = map(len, llv)
indices = np.concatenate(llv)
values = np.repeat(lcv, counts)

totals, indices_unique = sum_by_group(values, indices)
v[indices_unique] += totals
"""

method4="""
indices_weights = ((list_list_values[i], list_comp_values[i]) for i in list_range_values)
flat_indices_weights = ((i, weight) for indices, weight in indices_weights for i in indices)
i_w = np.rec.array(list(flat_indices_weights), dtype=[('i', 'i'), ('weight', 'd')])
v += np.histogram(i_w['i'], bins=range(0, outsz + 1), weights=i_w['weight'], new=True)[0]
"""

method5="""
llv = [np.fromiter(list_list_values[i], 'int') for i in list_range_values]
lcv = [list_comp_values[i] for i in list_range_values]
counts = map(len, llv)
indices = np.concatenate(llv)
values = np.repeat(lcv, counts)

v += np.histogram(indices, bins=range(0, outsz + 1), weights=values, new=True)[0]
"""


t1 = Timer(method1,setup=setstr).timeit(100)
print t1

t2 = Timer(method2,setup=setstr).timeit(100)
print t2

t3 = Timer(method3,setup=setstr).timeit(100)
print t3

t4 = Timer(method4,setup=setstr).timeit(100)
print t4

t5 = Timer(method5,setup=setstr).timeit(100)
print t5

exec(setstr_test + method1 + "\nprint v\n")
exec("\nv = np.zeros((outsz,))\n" + method2 + "\nprint v\n")
exec("\nv = np.zeros((outsz,))\n" + method3 + "\nprint v\n")
exec("\nv = np.zeros((outsz,))\n" + method4 + "\nprint v\n")
exec("\nv = np.zeros((outsz,))\n" + method5 + "\nprint v\n")

Note that to avoid having 98 and 99 in the same bin, you'd have to use `bins=range(0, 101)` instead. — senderle, Jan 02 '12 at 20:06
I changed the setup a little bit, `llv = [np.fromiter(list_list_values[i], 'int') for i in list_range_values]`. Passing a list of ndarrays to concatenate seems significantly faster than passing a list of lists. The histogram approach is a good idea. Unless the index is very sparse, it's much better if you can avoid having an index on the left side of the assignment. — Bi Rico, Jan 03 '12 at 02:28
@Bago, thanks for pointing that out -- it makes a big difference! I've updated the timings and code. — senderle, Jan 03 '12 at 14:48

score 1 · Answer 3 · edited May 23 '17 at 12:03

First, set up the variables you gave:

import numpy as np
list_range_values = [1, 3, 5]
list_list_values = [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45],
                    [4, 7, 8], [0, 1], [21, 31, 41]]
list_comp_values = [0.7, 9.8, 1.2, 5, 10, 11.7]
v = np.arange(100, dtype=float)

Next, list_list_values and list_comp_values need to be flattened so they are contiguous:

list_list_lens = np.array(map(len, list_list_values))
comp_vals_expanded = np.repeat(list_comp_values, list_list_lens)
import itertools
list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values),
                             dtype=int)

Then, the starting indices of each subarray are needed:

list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1]))

Now that we have both the starting and ending values, we can use the indices function from this question to get an array of selector indices:

def indices(start, end):
    lens = end - start
    np.cumsum(lens, out=lens)
    i = np.ones(lens[-1], dtype=int)
    i[0] = start[0]
    i[lens[:-1]] += start[1:]
    i[lens[:-1]] -= end[:-1]
    np.cumsum(i, out=i)
    return i

toadd = indices(list_list_starts[list_range_values],
                (list_list_starts + list_list_lens)[list_range_values])

Now that we've done all this magic, the array can be added to like this:

v[list_vals_flat[toadd]] += comp_vals_expanded[toadd]

Thank-you. The results are identical to the for-loop. Your solution is similar to Bago's which a tad cleaner for our needs. — Henry Thornton, Jan 02 '12 at 19:33
@dbv and @jterrace. Have you guys checked to see if this works when there are repeats in `list_vals_flat[toadd]`? Unless I'm missing something this method might have a bug. — Bi Rico, Jan 02 '12 at 21:04

Eliminate for-loop in Python and Numpy construct

3 Answers3