83

I have found some answers to this question before, but they seem to be obsolete for the current Python versions (or at least they don't work for me).

I want to check if a substring is contained in a list of strings. I only need the boolean result.

I found this solution:

word_to_check = 'or'
wordlist = ['yellow','orange','red']

result = any(word_to_check in word for word in worldlist)

From this code I would expect to get a True value. If the word was "der", then the output should be False.

However, the result is a generator function, and I can't find a way to get the True value.

Any idea?

martineau
  • 119,623
  • 25
  • 170
  • 301
Álvaro
  • 1,219
  • 2
  • 12
  • 20
  • 4
    The code you posted works fine (except for `wordlist`/`worldlist`). I'm guessing you forgot the `any()` call when you tried it before. – Gareth Latty May 05 '13 at 00:51
  • I missed that you already used `any`. – Ashwini Chaudhary May 05 '13 at 00:52
  • Taking a look at your code and comments, I think the problem is the "any" function I am using. It is probably the any function in the numpy module. So the solution would be to use the built-in function instead, but any idea on how to do this once the numpy module has been imported? – Álvaro May 05 '13 at 00:54
  • @DSM, it does. Just tried it on Python 3.3. – Mark Tolonen May 05 '13 at 00:58
  • @DSM np.any(a for a in b) returns a generator – askewchan May 05 '13 at 00:58
  • @MarkTolonen, askewchan: I'm looking at a bool right now, so something must have changed between 1.6.2 and 1.7+. – DSM May 05 '13 at 01:01
  • You guys were right, with the built-in function works perfectly and the problem is the numpy one. The only problem with Mark's suggestion is that working within "ipython --pylab" imports numpy directly, so Ashwini solution fits perfectly. Thanks a lot! – Álvaro May 05 '13 at 01:03
  • 4
    This problem comes up for me all the time when using `ipython --pylab`, which "helpfully" imports * from numpy for you. In that case you can directly use `__builtin__.any` without having to import `__builtin__` like in Ashwini's answer, since `__builtin__` shows up in interactive shells automatically. Also @DSM: apparently the behavior of `numpy.any` changed (for the worse) in 1.7. – Danica May 05 '13 at 01:03
  • @DSM, I'm using a 64-bit unofficial numpy 1.7.1 from http://www.lfd.uci.edu/~gohlke/pythonlibs/, so that could also be the issue. – Mark Tolonen May 05 '13 at 01:04
  • 2
    Also, see the new answer below that shows a much faster alternative approach by combining the words into a single string. – Raymond Hettinger May 05 '13 at 05:08

4 Answers4

67

Posted code

The OP's posted code using any() is correct and should work. The spelling of "worldlist" needs to be fixed though.

Alternate approach with str.join()

That said, there is a simple and fast solution to be had by using the substring search on a single combined string:

>>> wordlist = ['yellow','orange','red']
>>> combined = '\t'.join(wordlist)

>>> 'or' in combined
True
>>> 'der' in combined
False

For short wordlists, this is several times faster than the approach using any.

And if the combined string can be precomputed before the search, the in-operator search will always beat the any approach even for large wordlists.

Alternate approach with sets

The O(n) search speed can be reduced to O(1) if a substring set is precomputed in advance and if we don't mind using more memory.

Precomputed step:

from itertools import combinations

def substrings(word):
    for i, j in combinations(range(len(word) + 1), 2):
        yield word[i : j]

wordlist = ['yellow','orange','red']
word_set = set().union(*map(substrings, wordlist))

Fast O(1) search step:

>>> 'or' in word_set
True
>>> 'der' in word_set
False
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 7
    This is by far the most useful and simple solution in my opinion. It can also be shortened to one line: `'or' in '\t'.join(wordlist)` – mjp Aug 02 '16 at 17:46
  • 3
    Much faster than going through the list and using 'in' expression on each item – tonysepia Apr 18 '18 at 15:24
  • 1
    why use '\t' instead of ' '? – Nihar Karve May 24 '20 at 15:18
  • @NiharKarve Because the string being searched for is more likely to contain '?' than '\t' – Homunculus Reticulli Jun 30 '21 at 11:09
  • 1
    @Raymond Although I do agree that the join method is perhaps clearer, it is **not** faster than using `any()`. At best, it is the same speed (when the substring is not in the list). If the word exists in the list, `any` will short circuit and will not check the remainder of the list. For very large lists, this can be several orders of magnitude faster than joining. – Chris Collett Jan 14 '22 at 20:29
  • @mjp The one liner is nice if the wordlist is only done once; otherwise, it is best to precompute the *join()* step so that recurring searches only use the in-operator. Also, it reads little bit better as two separate steps :-) – Raymond Hettinger Jan 28 '22 at 17:44
  • 1
    @tonysepia Any separator can be used as long it doesn't occur in the wordlist. A '\t' tab is a safe choice. A space will work most of the time unless cases like "de facto" and "de jure" are treated as word units. – Raymond Hettinger Jan 28 '22 at 17:52
49

You can import any from __builtin__ in case it was replaced by some other any:

>>> from  __builtin__ import any as b_any
>>> lst = ['yellow', 'orange', 'red']
>>> word = "or"
>>> b_any(word in x for x in lst)
True

Note that in Python 3 __builtin__ has been renamed to builtins.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
19

You could use next instead:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = next((True for color in colors if search in color), False)

print(result) # True

To show the string that contains the substring:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = [color for color in colors if search in color]  

print(result) # Orange
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
stderr
  • 357
  • 1
  • 6
  • 18
  • 1
    That looks like a great way to find the objects with the substring, and could be used also for the True/False objective checking the length of the resulting array. – Álvaro May 05 '13 at 02:44
0

Also if someone wants to check if any of the values of a dictionary exists as a substring in a list of strings, can use this:

list_a = [
    'Copy of snap-009ecf9feb43d902b from us-west-2',
    'Copy of snap-0fe999422014504b6 from us-west-2',
    'Copy of snap-0fe999422014cscx504b6 from us-west-2',
    'Copy of snap-0fe999422sdad014504b6 from us-west-2'
]
dict_b = {
    '/dev/xvda': 'snap-0fe999422014504b6',
    '/dev/xvdsdsa': 'snap-sdvcsdvsdvs'
}

for b1 in dict_b.itervalues():
    result = next( ("found" for a1 in a if b1 in a1), "not found")
    print result 

It prints

not found
found
Kostas Demiris
  • 3,415
  • 8
  • 47
  • 85