5

I'm trying to sort through a set of matchers and a set of strings such that if I have

matchers = ['foo', 'bar', 'abc']

and

strings = ['afooa', 'zbarz', 'abcabc', 'purple', 'foobar']

I'd like to be able to get any element of strings where any element of matchers is a substring, such that

results = ['afooa', 'zbarz', 'abcabc', 'foobar'], ideally without just resorting to nested for-loops.

I've looked around for a while, but this is kind of a hard question to frame in searchable terms, so even any advice on the search front that anybody has would be much appreciated.

dtsavage
  • 359
  • 1
  • 14
  • "ideally without just resorting to nested for-loops" Why not? Because it's ugly and you think that there should be a cleaner, shorter way (as in a nested list comprehension), or is performance a problem, and you are looking for a faster approach, maybe something with search trees or the like? – tobias_k Nov 08 '16 at 21:47
  • You could create a regular expression to match `foo|bar|abc` – Peter Wood Nov 08 '16 at 21:47
  • Reopened this because there are multiple fundamentally different approaches to the problem (for example the regex-based one). However, I'm pretty sure there's a more specific duplicate lying around somewhere, I just don't really want to try to find it right now. – Karl Knechtel Mar 07 '23 at 20:23

2 Answers2

7
results = [s for s in strings if any(m in s for m in matchers)]

Explanation: We iterate through strings and add the element if any element from matchers is contained the strings element.

Rok Povsic
  • 4,626
  • 5
  • 37
  • 53
3

You could use a regular expression (RegEx) if you don't want to code loops:

>>> import re

>>> regex = re.compile('|'.join(matchers))

This creates a RegEx foo|bar|abc which will match any of those.

We can use regex's search method along with filter:

>>> list(filter(pattern.search, strings))
['afooa', 'zbarz', 'abcabc', 'foobar']

search returns a MatchObject, or None if there is no match. So we are filtering out the strings which match something (doesn't matter what the match is exactly.)

If we used match it would match from the start of the string:

>>> list(filter(pattern.match, strings))
['abcabc', 'foobar']
Peter Wood
  • 23,859
  • 5
  • 60
  • 99