0

Suppose you have a string which you want to parse into a specific format. That means: replace all ' ', '.', '-', etc with '_'.

I know that I could do this:

>s = "Hello----.....    World"
>s = s.replace('-','_').replace('.', '_').replace(' ', '_')
>print s
>Hello_____________World

And get what I want. But, is there a cleaner way? A more pythonic way? I tried parsing a list in to the first argument of replace, but that didn't work very well.

Pavlos Panteliadis
  • 1,495
  • 1
  • 15
  • 25

4 Answers4

2

Use Regular Expressions.

Ex:

import re

s = "Hello----.....    World"
print(re.sub(r"[ .-]", "_", s))

Here is the Python tutorial.

Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
1

Use re

>>> import re
>>> print re.sub(' |\.|-', '_',"Hello----.....    World")
Hello_____________World

Bonus solution not using regex:

>>> keys = [' ', '.', '-']
>>> print ''.join('_' if c in keys else c for c in "Hello----.....    World")
Hello_____________World
ospahiu
  • 3,465
  • 2
  • 13
  • 24
1

You can do it using str.translate and string.maketrans which will be the most efficient approach not chaining calls etc..:

In [6]: from string import maketrans

In [7]: s = "Hello----.....    World"

In [8]: table = maketrans(' .-',"___")

In [9]: print(s.translate(table))
Hello_____________World

The timings:

In [12]: %%timeit
   ....: s = "Hello----.....    World"
   ....: table = maketrans(' .-',"___")
   ....: s.translate(table)
   ....: 

1000000 loops, best of 3: 1.14 µs per loop

In [13]: timeit  s.replace('-','_').replace('.', '_').replace(' ', '_')
100000 loops, best of 3: 2.2 µs per loop
In [14]: %%timeit                                                      
text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
   ....: 
100000 loops, best of 3: 3.51 µs per loop

In [18]: %%timeit
....: s = "Hello----.....    World"
....: re.sub(r"[ .-]", "_", s)
....: 
100000 loops, best of 3: 11 µs per loop

Even pre-compiling the pattern leaves around 10µs so the regex is by far the least efficient approach.

In [20]: patt=  re.compile(r"[ .-]")

In [21]: %%timeit            
s = "Hello----.....    World"
patt.sub( "_", s)
   ....: 
100000 loops, best of 3: 9.98 µs per loop

Pre creating the table gets us down to nanoseconds:

In [22]: %%timeit                                                      
s = "Hello----.....    World"
s.translate(table)
   ....: 

1000000 loops, best of 3: 590 ns per loop
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

This answer lays out a variety of different ways to accomplish this task, contrasting different functions and inputs by speed.

If you are replacing few characters, the fastest way is the way in your question, by chaining multiple replaces, with regular expressions being the slowest.

If you want to make this more 'pythonic', the best way to leverage both speed and readability, is to make a list of the characters you want to replace, and loop through them.

text = "Hello----.....    World"
for ch in [' ', '.', '-']:
    if ch in text:
        text = text.replace(ch,'_')
Community
  • 1
  • 1
nook
  • 2,378
  • 5
  • 34
  • 54
  • This is incorrect, the fastest way is not chaining, also doing `if ch in text ` is redundant when you could just `text = text.replace(ch,'_')`, nothing is going to be replaced if there is nothing in there to replace – Padraic Cunningham Aug 20 '16 at 15:46