Replacing different characters in Python

Question

Suppose you have a string which you want to parse into a specific format. That means: replace all ' ', '.', '-', etc with '_'.

I know that I could do this:

>s = "Hello----.....    World"
>s = s.replace('-','_').replace('.', '_').replace(' ', '_')
>print s
>Hello_____________World

And get what I want. But, is there a cleaner way? A more pythonic way? I tried parsing a list in to the first argument of replace, but that didn't work very well.

score 2 · Answer 1 · answered Aug 19 '16 at 20:55

2

Use Regular Expressions.

Ex:

import re

s = "Hello----.....    World"
print(re.sub(r"[ .-]", "_", s))

Here is the Python tutorial.

answered Aug 19 '16 at 20:55

Laurent LAPORTE

21,958
6
58
103

ospahiu · Answer 2 · 2016-08-19T20:59:30.970

1

Use re

>>> import re
>>> print re.sub(' |\.|-', '_',"Hello----.....    World")
Hello_____________World

Bonus solution not using regex:

>>> keys = [' ', '.', '-']
>>> print ''.join('_' if c in keys else c for c in "Hello----.....    World")
Hello_____________World

edited Aug 19 '16 at 20:59

answered Aug 19 '16 at 20:54

ospahiu

3,465
2
13
24

Padraic Cunningham · Accepted Answer · 2016-08-20T15:53:44.067

You can do it using str.translate and string.maketrans which will be the most efficient approach not chaining calls etc..:

In [6]: from string import maketrans

In [7]: s = "Hello----.....    World"

In [8]: table = maketrans(' .-',"___")

In [9]: print(s.translate(table))
Hello_____________World

The timings:

In [12]: %%timeit
   ....: s = "Hello----.....    World"
   ....: table = maketrans(' .-',"___")
   ....: s.translate(table)
   ....: 

1000000 loops, best of 3: 1.14 µs per loop

In [13]: timeit  s.replace('-','_').replace('.', '_').replace(' ', '_')
100000 loops, best of 3: 2.2 µs per loop
In [14]: %%timeit                                                      
text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
   ....: 
100000 loops, best of 3: 3.51 µs per loop

In [18]: %%timeit
....: s = "Hello----.....    World"
....: re.sub(r"[ .-]", "_", s)
....: 
100000 loops, best of 3: 11 µs per loop

Even pre-compiling the pattern leaves around 10µs so the regex is by far the least efficient approach.

In [20]: patt=  re.compile(r"[ .-]")

In [21]: %%timeit            
s = "Hello----.....    World"
patt.sub( "_", s)
   ....: 
100000 loops, best of 3: 9.98 µs per loop

Pre creating the table gets us down to nanoseconds:

In [22]: %%timeit                                                      
s = "Hello----.....    World"
s.translate(table)
   ....: 

1000000 loops, best of 3: 590 ns per loop

score 0 · Answer 4 · edited May 23 '17 at 11:53

0

This answer lays out a variety of different ways to accomplish this task, contrasting different functions and inputs by speed.

If you are replacing few characters, the fastest way is the way in your question, by chaining multiple replaces, with regular expressions being the slowest.

If you want to make this more 'pythonic', the best way to leverage both speed and readability, is to make a list of the characters you want to replace, and loop through them.

text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')

edited May 23 '17 at 11:53

Community

1
1

answered Aug 19 '16 at 21:07

nook

2,378
5
34
54

This is incorrect, the fastest way is not chaining, also doing `if ch in text ` is redundant when you could just `text = text.replace(ch,'_')`, nothing is going to be replaced if there is nothing in there to replace – Padraic Cunningham Aug 20 '16 at 15:46

Replacing different characters in Python

4 Answers4