81

For Python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements?

In PHP, this was explicitly stated but I can't find a similar note for Python.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
wag2639
  • 2,523
  • 5
  • 25
  • 30
  • 2
    Avoid regex at all costs! ...Until absolutely necessary... – jathanism Apr 14 '11 at 20:03
  • 28
    @jathanism: I respectfully disagree. I avoided regex for decades until I finally took the time to sit down and and actually _learn_ them. Now I can't live without them. Regular expressions are _extremely_ useful for many day-to-day tasks and should be a familiar tool in every programmer's toolbox. – ridgerunner Apr 14 '11 at 20:27
  • 8
    @ridgerunner: Agreed, but it is also important to know *when* to use them. For simple string manipulations such as this, regular expressions are over the top. My rule of thumb is that if you can do it with the built-in string functions (`split()`, `replace()`, `find()` et al) without needing multiple status variables, complicated slicing etc you should. If it starts getting complex, then you move alternate tools such as regular expressions. – Blair Apr 14 '11 at 23:32
  • 7
    Oh, and a general comment on the speed of regular expressions: it depends on the context. In a script you run occasionally with a few regular expressions, you won't notice the overhead. On the other hand, in a script which does some intensive/high volume processing you might find the overhead unacceptable when you are using regular expressions lots. This is where profiling is important to determine where the bottleneck is (and I suppose I should trot out the *premature optimisation is the root of all evil* line at this point too). – Blair Apr 14 '11 at 23:37
  • 5
    @Blair: I wholeheartedly agree. But many seem to be averse to regex because they find them "difficult" and this is simply because they have not taken the time to learn tem beyond a superficial level. Yes, if a simple string replace solves the problem, then by all means use that, (which is also very likely the fastest solution as well). But I see way too many convoluted, complex string manipulation solutions to problems which are easily solved with a single, _well crafted_ regex. – ridgerunner Apr 15 '11 at 00:41
  • 2
    @ridgerunner: I didn't say not to use regex. It really depends on your use case. I think anyone who has to do parsing--and we all end up doing parsing at some point--will agree that you simply can't live without regex, but you can (and should) avoid it whenever possible. – jathanism Apr 15 '11 at 14:34

4 Answers4

84

As long as you can make do with str.replace(), you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
67

str.replace() should be used whenever it's possible to. It's more explicit, simpler, and faster.

In [1]: import re

In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""

In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop

In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop
chmullig
  • 13,006
  • 5
  • 35
  • 52
  • 2
    Out of curiosity, how were you executing `timeit` in your example output? Is that something special to iPython allowing you to use that syntax? (Oh, and +1!) – jathanism Apr 15 '11 at 14:31
  • 2
    Yup, ipython includes it magically. http://scienceoss.com/test-the-speed-of-your-code-interactively-in-ipython/ – chmullig Apr 15 '11 at 14:59
  • 2
    Unsure if this is a typo or I'm missing something, but your str.replace() run has 10x the number of loops as the regex run. – BoltzmannBrain Jan 08 '16 at 01:01
  • 2
    @alavin89 IPython chooses a "fitting value" for the iteration count if one is not specified (https://ipython.org/ipython-doc/3/interactive/magics.html#magic-timeit). It's possible that the value it chooses scales based on the time it takes to execute the snippet some small number of times. Since the timing numbers it reports are per loop, the difference in loop counts does not matter significantly. – NasaGeek Jun 07 '16 at 20:48
  • What if you had chained multiple `replace` vs a single regex. At some point a single regex replace should be faster than having N chained `replace`'s on a string, no? – radtek Oct 06 '17 at 21:15
  • 1
    very interesting but confusingly presented. other than the amount of loops being different by a factor of 10, there is also a difference in units for the time per loop (`us` vs `ns`) `text.replace` took 735 nano-seconds `re.sub` took 5,520 nano-seconds which is 7.5 times slower. – Edward Mar 08 '22 at 16:19
39

String manipulation is usually preferable to regex when you can figure out how to adapt it. Regex is incredibly powerful, but it's usually slower, and usually harder to write, debug, and maintain.

That being said, notice the amount of "usually" in the above paragraph! It's possible (and I've seen it done) to write a zillion lines of string manipulation for something you could've done with a 20-character regex. It's also possible to waste valuable time using "efficient" string functions on tasks a good regex engine could do almost as fast. Then there's maintainability: Regex can be horribly complex, but sometimes a regex will be simpler and easier to read than a giant block of procedural code.

Regex is fantastic for its intended purpose: searching for highly-variable needles in highly-variable haystacks. Think of it as a precision torque wrench: It's the perfect tool for a specific set of jobs, but it makes a lousy hammer.

Some guidelines you should follow when you aren't sure what to use:

If the answer to any of these questions is "yes", you probably want string manipulation. Otherwise, consider regex.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
11

Another thing to consider is that if you're doing rather complex replacements, str.translate() might be what you're looking for.

jathanism
  • 33,067
  • 9
  • 68
  • 86