32

Possible Duplicate:
Check if multiple strings exist in another string

I am trying to find out if there is a nice and clean way to test for 3 different strings.

Basically I am looping trough a file using a for loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.

So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:

for line in file
    if "string1" in line or "string2" in line or "string3" in line:
        print "found the string"

I was thinking like creating a list that contains string1, string2 and string3, and check if any of these is contained in the line, but it doesn't seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.

Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?

Community
  • 1
  • 1
user1006198
  • 4,681
  • 6
  • 21
  • 18

3 Answers3

77
strings = ("string1", "string2", "string3")
for line in file:
    if any(s in line for s in strings):
        print "yay!"
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
14

This still loops through the cartesian product of the two lists, but it does it one line:

>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
True
>>> any(s in l for l in lines2 for s in search_strings)
False

This also have the advantage that any short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings in linesX. If you want to find multiple occurrences you could do something like this:

>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]

If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you'll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • 1
    Actually there is. Check out the Aho-Corasick automaton. It can be done in linear time – Niklas B. Mar 19 '14 at 17:28
  • @NiklasB., thanks, that's quite interesting! Correct me if I'm wrong, but I think the result would still be quadratic, since the OP wants to test multiple lines for substring matches. But that still beats the naive `any` version (cubic, assuming average O(n) performance of `in`). – senderle Mar 19 '14 at 23:38
  • No it's linear. Building the automaton is linear time and feeding a string into the automaton is linear time as well – Niklas B. Mar 19 '14 at 23:39
  • Or to be more precise, you build the automaton from the needles once and reuse it for every line – Niklas B. Mar 19 '14 at 23:58
  • I thought the automaton was built from the needles... – senderle Mar 19 '14 at 23:59
  • yeah that's what I meant... – Niklas B. Mar 20 '14 at 00:00
  • Ok, but then that's `O(i * (n + m)) + O(s)`, where `i` is the number of input strings, `n` is the average length of an input string, `s` is the total number of characters in the search strings, and `m` is the average number of matches. – senderle Mar 20 '14 at 00:05
  • But if you just want a yes/no answer then it's O(s + i*n) = O(s + N) where N is the total input length :P Whatever, the details are not too important – Niklas B. Mar 20 '14 at 00:07
4

One approach is to combine the search strings into a regex pattern as in this answer.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • I think a regex is prettier, but doesn't regex have quite a lot of overhead? Can it really be more efficient than a simpler 'if ... or ... or ...:' ? – Sam Redway Oct 27 '17 at 09:14
  • @SamRedway There is some overhead, but on the other hand the search strings can be combined into one pattern that is processed "in one go". – Janne Karila Oct 27 '17 at 10:16
  • @SamRedway Hmm ok, a quick test makes the regex look bad in comparison – Janne Karila Oct 27 '17 at 10:36
  • 1
    So I would guess that in most use cases readability is more important than efficiency and this is a totally reasonable approach based on that metric. Would say though that the 'more efficient' may be misleading. – Sam Redway Oct 27 '17 at 10:38
  • @SamRedway Yeah, edited that out now. – Janne Karila Oct 27 '17 at 10:50