Retrieve number of times lists overlap by index

Question

I have a number of lists that have values of either 1 or 0 and have length 1024. I would like to find the number of times that two given lists overlap over all indexes (but only if they match with values == 1), but can't seem to think of way that keeps the number of comparisons low. Currently my approach is to get all indexes of my lists where the value == 1, and get the intersection of two lists, e.g.

#for each list, do the following
for x,j in enumerate(list1): 
    if j == 1:
       idx_list.append(x) 
# compare  two lists 
num_overlap = set(idx_list1).intersection(idx_list2)

Is this the most efficient way to find this value?

For example input/output (only showing 6 values instead of 1024):

list1 = [1 0 1 0 0 0]
list2 = [1 0 0 0 0 0]
num_overlap = 1 (both lists have ```1``` at index 0)

Jean-François Fabre · Accepted Answer · 2017-03-12T22:31:45.177

6

Just zip lists together, and apply all on the zipped result to see if it's all non-zeroes (if both elements of the list are "truthy". Issue 1 if it's the case. Sum the generator comprehension.

list1 = [1,0,1,0,0,0]
list2 = [1,0,0,0,0,0]

num_overlap = sum(1 for t in zip(list1,list2) if all(t))

Note: works with any number of lists that you can feed to zip.

Variant: since all(t) evaluates to 1, the code can be shortened a little bit, and we can even use map here to avoid the loop:

num_overlap = sum(map(all,zip(list1,list2)))

Benchmarking both solutions on a great deal of iterations:

2.3228490352630615 (gencomp)
2.1401889324188232 (map)

And the suggested solution using sum(x and y for x,y in zip(list1,list2)) is faster because there's no overhead from calling all

1.9283719062805176

(if you want to generalize with more than 2 lists you cannot use that last one, but if you only have 2 lists, it's the fastest option)

edited Mar 12 '17 at 22:31

answered Mar 12 '17 at 22:18

Jean-François Fabre

137,073
23
153
219

3

Slightly more readable version (personally): `sum(x and y for x,y in zip(list1,list2))`. – UltraInstinct Mar 12 '17 at 22:25
Agreed. Also, I believe the generator expression is actually faster than map. – idjaw Mar 12 '17 at 22:26
`x and y` is probably faster if you only have 2 lists, I reckon. – Jean-François Fabre Mar 12 '17 at 22:26
1

@idjaw I wouldn't bet on that. `map` uses an internal loop. But calling `all` has probably some overhead. Needs benchmarking. – Jean-François Fabre Mar 12 '17 at 22:28
1

@Jean-FrançoisFabre Agreed about `all`. There is a very interesting discussion here. A bit old, but I believe Python 3 is the topic here. Looks like there are cases for both out-performing the other. Interesting nonetheless! http://stackoverflow.com/a/1247490/1832539. But then there is this, which I haven't noticed before. Pretty cool! http://stackoverflow.com/a/13483314/1832539 – idjaw Mar 12 '17 at 22:31
of course `map` beats gencomp only for python 3! else the generated list kills the performance. And Alex Martelli answer is good. Use `map` only without lambda, otherwise it loses its interests (Alex Martelli is one of the main python coders BTW) – Jean-François Fabre Mar 12 '17 at 22:33
From a readability point of view I really like the first solution. – Elmex80s Mar 12 '17 at 22:38

score 2 · Answer 2 · answered Mar 12 '17 at 22:16

2

If you are just interested in the number of times, you can use numpy arrays, multiply them (the product is entrywise) and then sum the product. You will have 1 in the product only if the corresponding entries in both arrays were 1. Here is an example:

import numpy as np    
a=np.array([1,0,0,1])
b=np.array([1,0,1,0])
sum(a*b)

Then a * b=[1,0,0,0], and sum(a * b)=1.

answered Mar 12 '17 at 22:16

Miriam Farber

18,986
14
61
76

The question is, is this efficient? – Elmex80s Mar 12 '17 at 22:19
3

Is numpy seriously necessary here? That is overkill. To suggest installing `numpy` to solve something so trivial like this does not seem like good advice. – idjaw Mar 12 '17 at 22:21
2

@Elmex80s it's efficient... *once you have numpy imported* (and even then `sum(a==b)` might be clearer). But importing numpy for this alone is definitely overkill and will almost certainly take much longer. – Andras Deak -- Слава Україні Mar 12 '17 at 22:29
can't something be done using `numpy.compress` instead? would avoid the multiplying. – Jean-François Fabre Mar 12 '17 at 22:37
Sorry, my previous comment was wrong, of course you'd need `a&b` (and `np.sum` instead of the vanilla one). @Jean-FrançoisFabre once you have numpy it doesn't really matter: your `map` `2.17 ms`, `sum(a*b)` `969 µs`, `(a*b).sum()` `22.7 µs`, `(a&b).sum()` `19.1 µs` for two random arrays of 10k length. – Andras Deak -- Слава Україні Mar 12 '17 at 23:11

Retrieve number of times lists overlap by index

2 Answers2