I need to replace non-numeric chars from a string.
For example, "8-4545-225-144" needs to be "84545225144"; "$334fdf890==-" must be "334890".
How can I do this?
I need to replace non-numeric chars from a string.
For example, "8-4545-225-144" needs to be "84545225144"; "$334fdf890==-" must be "334890".
How can I do this?
''.join(c for c in S if c.isdigit())
filter(str.isdigit, s)
is faster and IMO clearer than anything else listed here.
It will also throw a TypeError if s is a unicode type. Depending on what definition of "digits" you want, this can be more or less useful than the alternative filter(type(s).isdigit, s)
, slightly slower but still faster than the re and comprehension versions for me.
Edit: Although if you are a poor sucker stuck with Python 3, you will need to use "".join(filter(str.isdigit, s))
which puts you firmly in the realm of equivalently bad performance. Such progress.
Let's time the join
and the re
versions:
In [3]: import re
In [4]: def withRe(theString): return re.sub('\D', '', theString)
...:
In [5]:
In [6]: def withJoin(S): return ''.join(c for c in S if c.isdigit())
...:
In [11]: s = "8-4545-225-144"
In [12]: %timeit withJoin(s)
100000 loops, best of 3: 6.89 us per loop
In [13]: %timeit withRe(s)
100000 loops, best of 3: 4.77 us per loop
The join
version is much nicer, compared to the re
one, but unfortunately is 50% slower. So if the performance is an issue, the elegance might need to be sacrificed.
EDIT
In [16]: def withFilter(s): return filter(str.isdigit, s)
....:
In [19]: %timeit withFilter(s)
100000 loops, best of 3: 2.75 us per loop
It looks like filter
is the performance and readability winner
Although a little more complicated to set up, using the translate()
string method to delete the characters as shown below can as much as 4-6 times faster than using join()
or re.sub()
according to timing tests I performed -- so if it is something done many times, you might want to consider using this instead.
nonnumerics = ''.join(c for c in ''.join(chr(i) for i in range(256)) if not c.isdigit())
astring = '123-$ab #6789'
print astring.translate(None, nonnumerics)
# 1236789
I prefer regular expressions, so here's a way if you like
import re
myStr = '$334fdf890==-'
digts = re.sub('[^0-9]','',myStr)
This should replace all nonnumeric occurences with '' i.e. with nothing. So digts variable should be '334890'