86

For one off string searches, is it faster to simply use str.find/rfind than to use re.match/search?

That is, for a given string, s, should I use:

if s.find('lookforme') > -1:
    do something

or

if re.match('lookforme',s):
    do something else

?

Mike Caron
  • 5,674
  • 4
  • 48
  • 71
  • 3
    For a one off, I'm pretty sure regex would be slower, because of the extra overhead. – Thomas K Feb 04 '11 at 18:35
  • 2
    You should be careful comparing the two, as they have different functionality. Find searches the entire string, whereas match matches the beginning only (i.e. it can exit early, depending on the data). So you're comparing apples and oranges there. – Zoran Pavlovic Jun 01 '16 at 11:16

7 Answers7

187

The question: which is faster is best answered by using timeit.

from timeit import timeit
import re

def find(string, text):
    if string.find(text) > -1:
        pass

def re_find(string, text):
    if re.match(text, string):
        pass

def best_find(string, text):
    if text in string:
       pass

print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")  
print timeit("re_find(string, text)", "from __main__ import re_find; string='lookforme'; text='look'")  
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")  

The output is:

0.441393852234
2.12302494049
0.251421928406

So not only should you use the in operator because it is easier to read, but because it is faster also.

Stephan
  • 16,509
  • 7
  • 35
  • 61
user225312
  • 126,773
  • 69
  • 172
  • 181
  • 5
    Micro-optimizations at best to choose based on time. However, +1 since you specified the most readable case... – ircmaxell Feb 04 '11 at 18:37
  • 5
    You certainly answered the question properly, sukhbir. Agreed, though +1 for readability and that you proved the answer is the "pythonic" one. – Mike Caron Feb 04 '11 at 18:49
  • 4
    Clearly you led the automatic jury by calling one "best_find" ;-) – Thomas K Feb 04 '11 at 18:51
  • 34
    Just so you know, the main thing slowing down the regexp here is having to compile the pattern every time. If the pattern is precompiled and used over and over again, matching is only about 25% slower than using find. – Justin Peel Feb 04 '11 at 23:30
  • For more complex scenarios (eg, I want to do something with the index if it exists, should I use `if x in X` before `X.find(x)`? and this is maybe dependent on the actual use case, I test the options inside my actual code using: ''' import time \ time_start = time.time() \ # option 1 \ print("Finished: %.4f sec" % (time.time() - time_start)) ''' – brita_ Mar 05 '16 at 17:21
  • Do note the dot operator is pretty slow in itself, if you can work around it, you'll gain some good speed in intense operations where this answer can be used. And python magic methods such as `__contains__` can be faster if referenced directly (no dot operator) rather then wrapped with `in`. – Tcll Jul 20 '16 at 03:15
  • Isn't that output surprisingly slow for a single find operation? Or is it not measured in seconds? – Matthew Woo Feb 20 '17 at 20:49
  • 6
    Is `re.match()` still slower than (consecutive) `in`s for patterns with many possibilities? e.g. `a|b|c|d|e|f` (pre-compiled pattern). – Aralox Aug 14 '17 at 07:14
  • 1
    @Aralox If you have multiple string to match, for `in` you need a loop to run `if text in string` multiple times, while `re.match()` only runs once. In my testing, if the number of string to match is large, say over 100, `re.match()` can be faster – oeter Apr 04 '23 at 08:24
21

Just to complete the most up-voted answer concerns about regex compilation time, here is a version with precompiled pattern:

from timeit import timeit
import re

def find(string, text):
    if string.find(text) > -1:
        pass

def re_find(string, text_re):
    if text_re.match(string):
        pass

def best_find(string, text):
    if text in string:
       pass

print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")  
print timeit("re_find(string, text_re)", "from __main__ import re_find; string='lookforme'; import re; text_re=re.compile('look')")  
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")

And my numbers:

0.189274072647
0.239935874939
0.0820939540863

Precompiled pattern improve numbers, but still, in is the faster.

Narann
  • 819
  • 2
  • 8
  • 20
19

Use this:

if 'lookforme' in s:
    do something

Regex need to be compiled first, which adds some overhead. Python's normal string search is very efficient anyways.

If you search the same term a lot or when you do something more complex then regex become more useful.

Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • 4
    +1 First be pythonic - then, if performance becomes an issue, explore different implementations to see if they improve performance. – Andrew Hare Feb 04 '11 at 18:30
12

Maybe someone is still interested. The given answers seem fine but only look at a very short string. In fact if you take a long string and the pattern you are looking for is roughly at the end then the performance changes in favor of regex!

import re

def find(string, text):
    if string.find(text) > -1:
        pass

def re_find(string, text):
    if re.match(text, string):
        pass

def best_find(string, text):
    if text in string:
       pass

very_long_string = 'sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd'
pattern = 'look'
print('pattern at the end of string')
print('find:', end=' ')
%timeit find(very_long_string + pattern, pattern)
print('regex:', end=' ')
%timeit re_find(very_long_string + pattern, pattern)
print('in:', end=' ')
%timeit best_find(very_long_string + pattern, pattern)
print('pattern in front of string')
print('find:', end=' ')
%timeit find(pattern + very_long_string, pattern)
print('regex:', end=' ')
%timeit re_find(pattern + very_long_string, pattern)
print('in:', end=' ')
%timeit best_find(pattern + very_long_string, pattern)

which gives the output:

pattern at the end of string
find: 3.41 µs ± 74.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
regex: 1.93 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
in: 3.32 µs ± 74.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
pattern in front of string
find: 748 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
regex: 2.03 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
in: 589 ns ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Summary: find and in depend on string length and location of pattern in the string while regex is somehow string-length independent and faster for very long strings with the pattern at the end.

Community
  • 1
  • 1
PaPeK
  • 121
  • 1
  • 2
8

I've had the same problem. I used Jupyter's %timeit to check:

import re
sent = "a sentence for measuring a find function"
sent_list = sent.split()
print("x in sentence")
%timeit "function" in sent
print("x in token list")
%timeit "function" in sent_list

print("regex search")
%timeit bool(re.match(".*function.*", sent))
print("compiled regex search")
regex = re.compile(".*function.*")
%timeit bool(regex.match(sent))

x in sentence 61.3 ns ± 3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

x in token list 93.3 ns ± 1.26 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

regex search 772 ns ± 8.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

compiled regex search 420 ns ± 7.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Compiling is fast but the simple in is better.

Yair Beer
  • 101
  • 2
  • 8
7

re.compile speeds up regexs a lot if you are searching for the same thing over and over. But I just got a huge speedup by using "in" to cull out bad cases before I match. Anecdotal, I know. ~Ben

Ben
  • 669
  • 8
  • 14
1

in addition to above answers, re.search() and re.match() takes the same runtime.

if(re.search(rf"\b{re.escape(some_keyword)}\b",some_sentence))

takes the same runtime as

if(re.search(rf"\b{re.escape(some_keyword)}\b",some_sentence))

and if your regex necessarily require some word match then it is rather a better option to reduce your regex comparison with "if" "in" search. For example the following is faster then then the above two and gives the same result:

if(some_keyword.lower() in some_sentence.lower()):
  if(re.search(rf"\b{re.escape(some_keyword)}\b",some_sentence)):