Replace all non-alphanumeric characters in a string

Question

I have a string with which i want to replace any character that isn't a standard character or number such as (a-z or 0-9) with an asterisk. For example, "h^&ell`.,|o w]{+orld" is replaced with "h*ell*o*w*orld". Note that multiple characters such as "^&" get replaced with one asterisk. How would I go about doing this?

Possible duplicate of [Stripping everything but alphanumeric chars from a string in Python](http://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python) — sds, Nov 03 '16 at 20:23
Short example `re.sub(r'\W+', '_', 'bla: bla**(bla)')` replaces one or more consecutive non-alphanumeric characters by an underscore. — Paul Rougieux, Oct 14 '21 at 15:33

score 254 · Accepted Answer · answered Oct 20 '12 at 05:11

254

Regex to the rescue!

import re

s = re.sub('[^0-9a-zA-Z]+', '*', s)

Example:

>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'

answered Oct 20 '12 at 05:11

nneonneo

171,345
36
312
383

7

If you handle unicode a lot, you may also need to keep all non-ASCII unicode symbols: `re.sub("[\x00-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F]+", " ", ":%# unicode ΣΘΙП@./\n")` – zhazha Jul 13 '16 at 07:43
1

If you want to keep spaces in your string, just add a space within the brackets: s = re.sub('[^0-9a-zA-Z ]+', '*', s) – stackPusher Oct 20 '16 at 16:31
3

If doing more than one replace, this will perform slightly quicker if you pre-compile the regex, e.g., `import re; regex = re.compile('[^0-9a-zA-Z]+'); regex.sub('*', 'h^&ell.,|o w]{+orld')` – Chris Jun 02 '18 at 15:47
Also note `\W` is for non-word characters, it's almost the same but allows the underscore as a word character (don't know why): https://docs.python.org/3.6/library/re.html#index-32 – JHS Dec 23 '18 at 01:32
You don't need the '+' in the regex – Serg Sep 17 '20 at 15:02
@Serg: The OP wanted to replace multiple consecutive characters with a single `*` - hence, the `+` in the regex. – nneonneo Sep 19 '20 at 00:13

score 55 · Answer 2 · edited Feb 28 '18 at 20:33

55

The pythonic way.

print "".join([ c if c.isalnum() else "*" for c in s ])

This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.

"h^&i => "h**i not "h*i" as in the regex solutions.

edited Feb 28 '18 at 20:33

crizCraig

8,487
6
54
53

answered Feb 28 '14 at 13:27

baloan

684
5
8

Don · Answer 3 · 2019-06-27T07:27:38.530

17

Try:

s = filter(str.isalnum, s)

in Python3:

s = ''.join(filter(str.isalnum, s))

Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit

edited Jun 27 '19 at 07:27

answered Jan 05 '15 at 05:15

Don

16,928
12
63
101

score 15 · Answer 4 · edited Nov 19 '21 at 19:08

15

Use \W which is equivalent to [^a-zA-Z0-9_]. Check the documentation, https://docs.python.org/2/library/re.html

import re
s =  'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'

update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.

edited Nov 19 '21 at 19:08

Csaba Toth

10,021
5
75
121

answered Aug 12 '16 at 18:54

psun

615
10
13

2

Note that *`\W` is equivalent to `[^a-zA-Z0-9_]`* only in Python 2.x. In Python 3.x, `\W+` is equivalent to `[^a-zA-Z0-9_]` only if `re.ASCII` / `re.A` flag is used. – Wiktor Stribiżew Apr 01 '19 at 20:27
Updated [link to the documentation of re](https://docs.python.org/3/library/re.html), search for `\W` in the page "Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched." – Paul Rougieux Oct 14 '21 at 15:31

Replace all non-alphanumeric characters in a string

4 Answers4

Linked