0

So i need some help on removing the digits from this string

import re

g="C0N4rtist"

re.sub(r'\W','',g)'

print(re.sub(r'\W','',g))

it should look like

CNrtist

but instead it gives me 04

I've made this code from researching online, and i've used this site http://docs.python.org/2/library/re.html for help. In my eyes, the code should work, and i have no clue what's wrong, so letting me know what's wrong would be very helpful as I've already researched online and in stackoverflow.

  • Or just use a generator expression instead of `re`: `print(''.join(x for x in g if not x in "0123456789"))` – l4mpi Dec 02 '13 at 18:56

2 Answers2

4

Use \d for digits:

>>> import re
>>> g = "C0N4rtist"
>>> re.sub(r'\d+', '', g)
'CNrtist'

Note that you don't need regex for this, str.translate is very fast compared to the regex version

>>> from string import digits
>>> g.translate(None, digits)
'CNrtist'

Timings:

>>> g = "C0N4rtist"*100
>>> %timeit g.translate(None, digits)      #winner
100000 loops, best of 3: 9.98 us per loop
>>> %timeit ''.join(i for i in g if not i.isdigit())
1000 loops, best of 3: 507 us per loop
>>> %timeit re.sub(r'\d+', '', g)
1000 loops, best of 3: 253 us per loop
>>> %timeit ''.join([i for i in g if not i.isdigit()])
1000 loops, best of 3: 352 us per loop
>>> %timeit ''.join([i for i in g if i not in digits])
1000 loops, best of 3: 277 us per loop
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 2
    The translate takes a slightly different form for Python 3.x though: `g.translate(dict.fromkeys(map(ord, '0123456789')))` (you may also want to just try `\d` for the regex version, as the regex engine doesn't need to worry about run lengths or anything, so it might be introducing some unnecessary overhead) – Jon Clements Dec 02 '13 at 19:01
  • +1 for `translate`, but IMO it should only be used in performance-critical situations as I think a generator expression is a bit more readable. – l4mpi Dec 02 '13 at 19:04
  • Well, it's not a completely fair timing as the generator expression can obviously be optimized using `in` and a set of the digits, but I think translate will still be faster than that :) – l4mpi Dec 02 '13 at 19:06
  • @l4mpi In fact generator expression is always slower with `str.join`, as it requires double iteration. – Ashwini Chaudhary Dec 02 '13 at 19:12
  • Yes, but `isdigit` has a performance overhead as well, as shown by your new timings. Interesting to see that the list comprehension is considerably faster than the generator expression, I didn't think it would make that much of a difference... but of course iterating over a generator will be slower than iterating over a list. – l4mpi Dec 02 '13 at 19:20
  • 1
    @l4mpi Actually `str.join` is a [special case](http://stackoverflow.com/a/9061024/846892). – Ashwini Chaudhary Dec 02 '13 at 19:23
3

There is no need to use regex for this. you can use isdigit() function

   def removeDigitsFromStr(_str):
        result = ''.join(i for i in _str if not i.isdigit())
        return result
Saher Ahwal
  • 9,015
  • 32
  • 84
  • 152
  • I would prefer to use `in` in this case, as in `... if not i in digits`, where `digits` is `"0123456789"` or `set("0123456789")` - but that's mainly personal preference in this case and does not make much of a difference unless the code is performance-critical. – l4mpi Dec 02 '13 at 19:02
  • I agree. Thanks for bringing this up – Saher Ahwal Dec 02 '13 at 19:11