3

I have a number of lists that have values of either 1 or 0 and have length 1024. I would like to find the number of times that two given lists overlap over all indexes (but only if they match with values == 1), but can't seem to think of way that keeps the number of comparisons low. Currently my approach is to get all indexes of my lists where the value == 1, and get the intersection of two lists, e.g.

#for each list, do the following
for x,j in enumerate(list1): 
    if j == 1:
       idx_list.append(x) 
# compare  two lists 
num_overlap = set(idx_list1).intersection(idx_list2)

Is this the most efficient way to find this value?

For example input/output (only showing 6 values instead of 1024):

list1 = [1 0 1 0 0 0]
list2 = [1 0 0 0 0 0]
num_overlap = 1 (both lists have ```1``` at index 0) 
DrTchocky
  • 515
  • 1
  • 5
  • 14

2 Answers2

6

Just zip lists together, and apply all on the zipped result to see if it's all non-zeroes (if both elements of the list are "truthy". Issue 1 if it's the case. Sum the generator comprehension.

list1 = [1,0,1,0,0,0]
list2 = [1,0,0,0,0,0]

num_overlap = sum(1 for t in zip(list1,list2) if all(t))

Note: works with any number of lists that you can feed to zip.

Variant: since all(t) evaluates to 1, the code can be shortened a little bit, and we can even use map here to avoid the loop:

num_overlap = sum(map(all,zip(list1,list2)))

Benchmarking both solutions on a great deal of iterations:

2.3228490352630615 (gencomp)
2.1401889324188232 (map)

And the suggested solution using sum(x and y for x,y in zip(list1,list2)) is faster because there's no overhead from calling all

1.9283719062805176

(if you want to generalize with more than 2 lists you cannot use that last one, but if you only have 2 lists, it's the fastest option)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 3
    Slightly more readable version (personally): `sum(x and y for x,y in zip(list1,list2))`. – UltraInstinct Mar 12 '17 at 22:25
  • Agreed. Also, I believe the generator expression is actually faster than map. – idjaw Mar 12 '17 at 22:26
  • `x and y` is probably faster if you only have 2 lists, I reckon. – Jean-François Fabre Mar 12 '17 at 22:26
  • 1
    @idjaw I wouldn't bet on that. `map` uses an internal loop. But calling `all` has probably some overhead. Needs benchmarking. – Jean-François Fabre Mar 12 '17 at 22:28
  • 1
    @Jean-FrançoisFabre Agreed about `all`. There is a very interesting discussion here. A bit old, but I believe Python 3 is the topic here. Looks like there are cases for both out-performing the other. Interesting nonetheless! http://stackoverflow.com/a/1247490/1832539. But then there is this, which I haven't noticed before. Pretty cool! http://stackoverflow.com/a/13483314/1832539 – idjaw Mar 12 '17 at 22:31
  • of course `map` beats gencomp only for python 3! else the generated list kills the performance. And Alex Martelli answer is good. Use `map` only without lambda, otherwise it loses its interests (Alex Martelli is one of the main python coders BTW) – Jean-François Fabre Mar 12 '17 at 22:33
  • From a readability point of view I really like the first solution. – Elmex80s Mar 12 '17 at 22:38
2

If you are just interested in the number of times, you can use numpy arrays, multiply them (the product is entrywise) and then sum the product. You will have 1 in the product only if the corresponding entries in both arrays were 1. Here is an example:

import numpy as np    
a=np.array([1,0,0,1])
b=np.array([1,0,1,0])
sum(a*b)

Then a * b=[1,0,0,0], and sum(a * b)=1.

Miriam Farber
  • 18,986
  • 14
  • 61
  • 76