2

Is there a more efficient way to compare a list of strings rather than using a for loops?

I would like to check the existence of x strings in y (at any part of y strings).

x = ['a1' , 'a2', 'bk']
y = ['a1aa' , 'a2lop' , 'bnkl', 'a1sss', 'flask']
for i in x:
    print([i in str_y for str_y in y])

Results:

[True, False, False, True, False]
[False, True, False, False, False]
[False, False, False, False, False]

5 Answers5

2

Use list compressions:

In [4]: [[b in a for a in y] for b in x]
Out[4]:
[[True, False, False, True, False],
 [False, True, False, False, False],
 [False, False, False, False, False]]

Testing the timing:

 %timeit print([[b in a for a in y] for b in x])
<lots of printing>
228 µs ± 5.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
 %timeit for i in x:   print([i in x for x in y])
<lots of printing>
492 µs ± 4.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So half the time.

Steve Barnes
  • 27,618
  • 6
  • 63
  • 73
  • 1
    That won't make a difference computation efficiency -wise. – AKX May 02 '18 at 10:35
  • @AKX Surprisingly it does! – Steve Barnes May 02 '18 at 10:41
  • Only because you're doing IO in larger blocks, I think. It's an apples-and-oranges comparison, though, since the printed output is not identical to the original solution. – AKX May 02 '18 at 10:42
  • 1
    Replacing `print` with a dummy `pass` function, the solutions are practically just as fast. (orig: 2.6698 / comprehensions 2.6660) – AKX May 02 '18 at 10:46
1

You can use itertools.product to get all the results in one list:

In [61]: x = ['a1' , 'a2', 'bk']
    ...: y = ['a1aa' , 'a2lop' , 'bnkl', 'a1sss', 'flask']
    ...: 

In [62]: [i in j for i, j in product(x, y)]

Or as a functional approach you can use starmap and product together:

from itertools import product, starmap
from operator import contains

list((starmap(contains, product(y, x))))

Also, a vectorized BUT not very optimized is as following:

In [139]: (np.core.defchararray.find(y[:,None], x) != -1).T
Out[139]: 
array([[ True, False, False,  True, False],
       [False,  True, False, False, False],
       [False, False, False, False, False]])
Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

You can use just list comprehension.

x = ['a1' , 'a2', 'bk']
y = ['a1aa' , 'a2lop' , 'bnkl', 'a1sss', 'flask']
print([[xi in z for z in y] for xi in x])
Mihai Alexandru-Ionut
  • 47,092
  • 13
  • 101
  • 128
0

No, I don't think so, not a straightforward way that wouldn't require lots of precomputation anyway.

If all you need is to know whether one of the needles is in the haystacks, use any() – or a good old for loop and break is even faster:

needles = ['a1' , 'a2', 'bk']
haystacks = ['a1aa' , 'a2lop' , 'bnkl', 'a1sss', 'flask']

for haystack in haystacks:
    for needle in needles:
      if needle in haystack:
        print((needle, haystack))
        break  # Break if finding one match is enough
AKX
  • 152,115
  • 15
  • 115
  • 172
0

I think the only solution is with a for loop... Best thing is doing this if you want to keep the code in one line:

print( [[xx in i for i in y] for xx in x] )
toom501
  • 324
  • 1
  • 3
  • 15