2

What is a good pythonic way to match a list of substrings to a list of strings, like the following:

if 'sub1' in str1 or 'sub2' in str1 or ... 'subN' in str1 or\
   'sub1' in str2 or 'sub2' in str2 or ... 'subN' in str2 or\
   ...
   'sub1' in strM or 'sub2' in strM or ... 'subN' in strM:

One way is to unite them with list comprehension, like this:

strList = [str1, str2, ..., strM]
subList = ['sub1', ..., 'subN']
if any(sub in str for sub in subList for str in strList):

Is there anything better, like a library function perhaps, to absorb one of the dimensions?

Thank you very much.

gt6989b
  • 4,125
  • 8
  • 46
  • 64
  • If the substrings are short enough (and not too many), you could compose them into a regular expression: `sub1|sub2|sub3|...|subN` – alexis Jun 21 '13 at 12:14
  • Depending on the number and size of strings, it might be faster to concatenate all `str[1..n]`s into one large string and then check for substring matches once, using regex or `any(... in ...)`. – Tim Pietzcker Jun 21 '13 at 12:30
  • @TimPietzcker that's an interesting thought. But the advantage for my specific case is that substrings are static, while strings are dynamic, so I can precompute the substring RE once and use it all over the place. But this is still quite original, thank you very much. – gt6989b Jun 21 '13 at 12:38

2 Answers2

5

You could compile the substrings into a regular expression, and use that to search each string. If you don't have so many substrings that the RE exceeds internal limits, this is probably the most efficient way.

pattern = "|".join(re.escape(s) for s in subList)
crexp = re.compile(pattern)
if any(crexp.search(s) for s in strList):
    ...
alexis
  • 48,685
  • 16
  • 101
  • 161
3

As described in this answer, a regular expression would be the way to go, since these are modeled as a DFA that can check for all substrings at the same time. You should probably read that answer, as it is quite in-depth.

Community
  • 1
  • 1
Imre Kerr
  • 2,388
  • 14
  • 34