Remove characters except digits from string using Python?

Question

How can I remove all characters except numbers from string?

@JG: I have gtk.Entry() and i want multiply float entered into it. — Jan Tojnar, Oct 03 '09 at 05:38
@JanTojnar use re.sub method as per answer two and explicitly list which chars to keep e.g. re.sub("[^0123456789\.]","","poo123.4and5fish") — Roger Heathcote, Dec 30 '12 at 16:26
If you only want to *check* if the string is all digits, see https://stackoverflow.com/questions/1323364. — Karl Knechtel, Aug 01 '22 at 20:12

score 284 · Answer 1 · edited Jul 04 '23 at 18:47

284

Use re.sub, like so:

>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use filter, like so (in Python 2):

>>> filter(str.isdigit, 'aas30dsa20')
'3020'

Since in Python 3, filter returns an iterator instead of a list, you can use the following instead:

>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'

edited Jul 04 '23 at 18:47

Tim Tisdall

9,914
3
52
82

answered Sep 20 '09 at 12:18

João Silva

89,303
29
152
158

re is evil in such simple task, second one is the best I think, cause 'is...' methods are the fastest for strings. – f0b0s Sep 20 '09 at 12:25
your filter example is limited to py2k – SilentGhost Sep 20 '09 at 12:29
2

@f0b0s-iu9-info: did you timed it? on my machine (py3k) re is twice as fast than filter with `isdigit`, generator with `isdigt` is halfway between them – SilentGhost Sep 20 '09 at 12:35
@SilentGhost: Thanks, I was using IDLE from py2k. It's fixed now. – João Silva Sep 20 '09 at 12:35
2

For Python 3.6 it should be `re.sub("\\D", "", "aas30dsa20")` . Otherwise one gets a `DeprecationWarning: invalid escape sequence \D` . – asmaier Oct 17 '19 at 14:57
4

@asmaier Simply use `r` for raw string: `re.sub(r"\D+", "", "aas30dsa20")` – Mitch McMabers Nov 06 '19 at 19:34
This solutions fails on decimals and negative numbers in accounting format i.e. ($2,000) = 2000 not -2000 – Doug Jun 11 '20 at 21:55

score 118 · Accepted Answer · edited Oct 22 '14 at 21:17

In Python 2.*, by far the fastest approach is the .translate method:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive...:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

comprehensive, especial the Python3.x(Unicode) part. maybe Unicode is more powerful in a much bigger domain, for example: removing characters except Chinese characters from Unicode string — sunqiang, Sep 21 '09 at 01:54
@sunqiang, yes, absolutely -- there's a reason Py3k has gone to Unicode as THE text string type, instead of byte strings as in Py2 -- same reason Java and C# have always had the same "string means unicode" meme... some overhead, maybe, but MUCH better support for just about anything but English!-). — Alex Martelli, Sep 21 '09 at 02:07
This is all very well and good, but regular expressions do it in one line! — mlissner, Sep 10 '11 at 05:37
`x.translate(None, string.digits)` actually results in `'aaabbbbbb'`, which is the opposite of what is intended. — Tom Dalling, Mar 26 '12 at 08:12
Echoing comments from Tom Dalling, your first example keeps all the undesirable characters -- does the opposite of what you said. — Chris Johnson, Sep 04 '12 at 14:42
@RyanB.Lynch et al, the fault was with a later editor and two other users that [approved said edit](http://stackoverflow.com/review/suggested-edits/343120), which, in fact, is totally wrong. Reverted. — Nick T, Apr 11 '13 at 16:38
The best answer - the fastest solution. Just worth to mention that it beats also 'filter(lambda x: x.isdigit(), string_to_translate)' — Radoslaw Garbacz, Jul 01 '17 at 03:24
You can also do `x.translate(None, string.letters)` or `x.translate(None, string.letters+string.punctuation)` (if you know the input is all letters or letters/punctuation). It's a little more concise than using `maketrans`. — nrlakin, Oct 31 '17 at 00:21

score 77 · Answer 3 · answered Sep 20 '09 at 12:24

77

s=''.join(i for i in s if i.isdigit())

Another generator variant.

answered Sep 20 '09 at 12:24

f0b0s

2,978
26
30

Killed it..+1 Would have been even better if lamda was used – Barath Ravikumar Sep 07 '16 at 19:48
2

If you want to include any custom characters, for example include negatives or decimals - do this: `s = ''.join(i for i in s if i.isdigit() or i in '-./\\')` – Eugene Chabanov Aug 29 '20 at 20:46
1

Fantastic solution without any imports – Igor Atsberger Oct 07 '21 at 11:25
Just love this b/c it requires no imports!! – George Hayward Jun 04 '22 at 19:26
I would say this is the best solution so far ! – Kedar Joshi Dec 07 '22 at 19:34

freiksenet · Answer 4 · 2009-09-20T17:15:48.053

18

You can use filter:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

edited Sep 20 '09 at 17:15

answered Sep 20 '09 at 12:24

freiksenet

3,569
3
28
28

only in py2k, in py3k it returns a generator – SilentGhost Sep 20 '09 at 12:33
convert `str` to `list` to make sure it works on both py2 and py3: `''.join(filter(lambda x: x.isdigit(), list("dasdasd2313dsa")))` – Luiz C. Feb 09 '17 at 18:25

score 16 · Answer 5 · answered Aug 30 '16 at 19:03

16

You can easily do it using Regex

>>> import re
>>> re.sub("\D","","£70,000")
70000

answered Aug 30 '16 at 19:03

Aminah Nuraini

18,120
8
90
108

By far the easiest way – Iorek Jul 28 '18 at 23:06
8

How is this different than João Silva's answer, which was provided 7 years earlier? – jww Jun 30 '19 at 13:10

score 14 · Answer 6 · answered Sep 20 '09 at 12:23

14

along the lines of bayer's answer:

''.join(i for i in s if i.isdigit())

answered Sep 20 '09 at 12:23

SilentGhost

307,395
66
306
293

No, this won't work for negative numbers because `-` is not a digit. – Oli May 15 '17 at 10:09

score 11 · Answer 7 · answered Dec 30 '12 at 16:31

11

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

answered Dec 30 '12 at 16:31

Roger Heathcote

3,091
1
33
39

What about "poo123.4and.5fish"? – Jan Tojnar Jan 01 '13 at 20:22
In my code I check for the number of periods in the input string and raise an error if that is more than 1. – Roger Heathcote Jan 04 '13 at 11:20

score 6 · Answer 8 · edited Mar 04 '13 at 13:26

6

x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

x.translate(None, string.letters)

edited Mar 04 '13 at 13:26

Gilles 'SO- stop being evil'

104,111
38
209
254

answered Mar 04 '13 at 13:00

Terje Molnes

69
1
1

3

I get a `TypeError`: translate() takes exactly one argument (2 given). Why this question was upvoted in its current state is quite frustrating. – Bobort Oct 13 '16 at 15:11
1

translate changed from python 2 to 3. The syntax using this method in python 3 is x.translate(str.maketrans('', '', string.digits)) and x.translate(str.maketrans('', '', string.ascii_letters)) . Neither of these strips white space. I wouldn't really recommend this approach anymore... – ZaxR Aug 16 '18 at 19:19

score 5 · Answer 9 · answered Sep 20 '09 at 12:21

5

Use a generator expression:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

answered Sep 20 '09 at 12:21

bayer

6,854
24
35

instead do `''.join(n for n in foo if n.isdigit())` – shxfee Apr 07 '15 at 06:33
With a small modification, `"".join([i for i in s if i in "0123456789"])` , bayer's solution is faster than using "isdigit". It performs in 15% less time. Of all the solutions presented on this page, the quickest is @rescdsk 's. However, when it is not a loop, it is better to stick with the quickest "one line" solution. – Anselmo Blanco Dominguez Jan 22 '21 at 21:38

score 5 · Answer 10 · answered Oct 22 '14 at 21:09

A fast version for Python 3:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

score 5 · Answer 11 · edited Apr 22 '21 at 13:09

5

Try:

import re

string = '1abcd2XYZ3'
string_without_letters = re.sub(r'[a-z]', '', string.lower())

this should give:

edited Apr 22 '21 at 13:09

dboy

1,004
2
16
24

answered Dec 15 '20 at 18:45

João

301
4
9

so `[a-z]` means all lowercase letters or for uppercase we have to `[A-Z]`? – Muneeb Ahmad Khurram Jun 13 '21 at 09:21
[a-z] will work for both lower and uppercases :) – João Jun 14 '21 at 13:14
1

yes, because I just noticed the `string.lower()` is your best friend. – Muneeb Ahmad Khurram Jun 14 '21 at 19:39

score 2 · Answer 12 · answered Sep 20 '09 at 12:23

2

Ugly but works:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

answered Sep 20 '09 at 12:23

Gant

29,661
6
46
65

@SilentGhost it's my misunderstanding. had it corrected thanks :) – Gant Sep 20 '09 at 12:26
Actually, with this method, I don't think you need to use "join." `filter(lambda x: x.isdigit(), s)` worked fine for me. ...oh, it's because I'm using Python 2.7. – Bobort Oct 13 '16 at 15:21

AnilReddy · Answer 13 · 2018-07-16T20:32:11.040

2

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.48 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 2.02 usec per loop

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.37 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.

edited Jul 16 '18 at 20:32

answered Jul 16 '18 at 19:21

AnilReddy

212
2
13

Why are you repeating the two methods twice? And could you describe how is your answer different from the accepted one? – Jan Tojnar Jul 16 '18 at 22:56
Both results the same output. But, I just wanna show that join is faster the sub method in the results. – AnilReddy Jul 17 '18 at 11:55
They do not, your code does the opposite. And also you have four measurements but only two methods. – Jan Tojnar Jul 17 '18 at 13:44

alfredo · Answer 14 · 2019-05-17T21:12:59.923

2

You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

edited May 17 '19 at 21:12

answered May 17 '19 at 20:54

alfredo

524
6
9

how is this different from the answer by f0b0s? You should edit that answer instead if you have more information to bring – chevybow May 17 '19 at 21:13

score 2 · Answer 15 · answered Sep 02 '22 at 03:40

2

You can use join + filter + lambda:

''.join(filter(lambda s: s.isdigit(), "20 years ago, 2 months ago, 2 days ago"))

Output: '2022'

answered Sep 02 '22 at 03:40

Faisal Fida

33
6

score 0 · Answer 16 · answered Jan 24 '18 at 11:03

0

Not a one liner but very simple:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

answered Jan 24 '18 at 11:03

Josh

1

score 0 · Answer 17 · edited May 18 '19 at 18:06

0

I used this. 'letters' should contain all the letters that you want to get rid of:

Output = Input.translate({ord(i): None for i in 'letters'}))

Example:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

Output: 20

edited May 18 '19 at 18:06

chb

1,727
7
25
47

answered May 18 '19 at 17:26

Gustav

1
1

score 0 · Answer 18 · answered Oct 18 '20 at 22:58

0

my_string="sdfsdfsdfsfsdf353dsg345435sdfs525436654.dgg(" 
my_string=''.join((ch if ch in '0123456789' else '') for ch in my_string)
print(output:+my_string)

output: 353345435525436654

answered Oct 18 '20 at 22:58

Kokul Jose

1,384
2
14
26

add this, as well for decimal point numbers, `if ch in '0123456789.' else ''` so that a `.` is also added. – Muneeb Ahmad Khurram Aug 17 '21 at 14:28

score 0 · Answer 19 · answered Sep 30 '22 at 11:00

0

Another one:

import re

re.sub('[^0-9]', '', 'ABC123 456')

Result:

'123456'

answered Sep 30 '22 at 11:00

David

2,942
33
16

Remove characters except digits from string using Python?

19 Answers19

Linked

Related