3

Fastest and elegant way to check whether some element expressed by regular expression is in a given list.

For example: given a list:

newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')

In this question: Regular Expressions: Search in list

list(filter(regex.match,newlist))

give me a list

['this','thas']

However, I just want to return True or False. Therefore above method is not efficient since it looks through all element of newlist. Is there a way like

'this' in newlist

to efficiently and elegantly check whether some element expressed by regular expression is in a given list.

maplemaple
  • 1,297
  • 6
  • 24
  • 2
    Using `regex` in the first place is maybe not the most efficient method... – Allan Jan 31 '19 at 02:03
  • 5
    Use `any()` rather than `filter()`? – Loocid Jan 31 '19 at 02:03
  • Make a single string out of _newlist_: `,this,thiis,thas,sada,`. Keep it, then run a _findall()_, or a single match using a new constructed regex `,(th.s),`. This will give you the `['this','thas']` or TRUE/FALSE without having to iterate. –  Jan 31 '19 at 02:17
  • @Loocid Do you mean any(filter(regex.match,newlist))? It still looks through all element of newlist to return "True or False", right? – maplemaple Jan 31 '19 at 02:25
  • @Loocid Do you mean in python3 the filter() gives me a iterator not a full list, therefore when I composite any(), if it's not the worst case, it will not go through all element of newlist to return True? – maplemaple Jan 31 '19 at 02:36

3 Answers3

2

As Loocid suggested, you can use any. I would do it with a generator expression like so:

newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')

result = any(regex.match(word) for word in newlist)
print(result) # True

Here is another version with map that is slightly faster:

result = any(map(regex.match, newlist))
iz_
  • 15,923
  • 3
  • 25
  • 40
  • Thank you so much. By the way, I have a question : Does any(filter(regex.match,newlist)) has the same effect as your code? Since I just read in python3 filter gives an iterator not a full list. – maplemaple Jan 31 '19 at 02:47
  • @maplemaple Yes, but essentially all you are doing with that is checking if the length of `filter(regex.match,newlist)` is greater that 0. I think my version is a bit more explicit. – iz_ Jan 31 '19 at 02:51
  • @maplemaple FYI, I added a slightly faster version with `map`. – iz_ Jan 31 '19 at 02:56
  • There is one more problem. If I give `regex = re.compile('th.')`, result should be `False` since `'th.'` not in the list. But the code is still `True` – maplemaple Jan 31 '19 at 06:23
  • @maplemaple Add `$` to your regex to make it `'th.$'`. `$` means end of string. – iz_ Jan 31 '19 at 06:33
1

This will evaluate the list until it finds the first match.

def search_for_match(list):
    result = False
    for i in newlist:
        if bool(re.match(r"th.s", i)) is True:
            result = True
            break
    return result

Or to make it more general:

def search_for_match(list, pattern):
    result = False
    for i in list:
        if bool(re.match(pattern, i)) is True:
            result = True
            break
    return result

newlist = ['this','thiis','thas','sada']
found = search_for_match(newlist, r"th.s")
print(found) # True

Just for kicks I ran these through the timer. I sooo lost:

t = time.process_time()
newlist = ['this','thiis','thas','sada']
search_for_match(newlist, r"th.s")
elapsed_time1 = time.process_time() - t
print(elapsed_time1) # 0.00015399999999998748

t2 = time.process_time()
newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')
result = any(regex.match(word) for word in newlist)
elapsed_time2 = time.process_time() - t2
print(elapsed_time2) # 1.1999999999900979e-05

t3 = time.process_time()
newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')
result = any(map(regex.match, newlist))
elapsed_time3 = time.process_time() - t3
print(elapsed_time3) # 5.999999999950489e-06        
Carl Brubaker
  • 1,602
  • 11
  • 26
  • @Tomothy32, I ran them through a timer cause I was curious, not because I didn't believe you. Wow is mine slow. – Carl Brubaker Jan 31 '19 at 03:13
  • @CarlBrubaker No problem. By the way, you're skewing the results by timing everything, not just that one line of code. You should only be timing `search_for_match(newlist, r"th.s")`, `result = any(regex.match(word) for word in newlist)`, and `result = any(map(regex.match, newlist))` individually. The `timeit` module is also a better idea, as it repeats tests more than once and is more accurate in general. Also, as a side note, you can speed up your function by precompiling the regex. – iz_ Jan 31 '19 at 03:40
  • There is one more problem. If I give `regex = re.compile('th.')`, result should be `False` since `'th.'` not in the list. But the code is still `True` – maplemaple Jan 31 '19 at 06:24
0

I can think of (besides using any)

next((x for x in newlist if regex.match(x)), False)

Does not return True but probably OK for conditional testing if you have no empty strings :)

user2468968
  • 286
  • 3
  • 9