How to remove special characers from a column of dataframe using module re?

Question

Hey I have seen that link but nowhere there they have used re module that's why I have posted here. Hope you understand and remove the duplicate.

Here is the Link. I want to use re module.

Table:

A    B    C    D
1    Q!   W@   2
2    1$   E%   3
3    S2#  D!   4

here I want to remove the special characters from column B and C. I have used .transform() but I want to do it using re if possible but I am getting errors.

Output:

A    B    C    D   E   F
1    Q!   W@   2   Q   W
2    1$   E%   3   1   E
3    S2#  D!   4   S2  D

My Code:

df['E'] = df['B'].str.translate(None, ",!.; -@!%^&*)(")

It's working only if I know what are the special characters.

But I want to use re which would be the best way.

import re
#re.sub(r'\W+', '', your_string)
df['E'] = re.sub(r'\W+', '', df['B'].str)

Here I am getting error:

TypeError: expected string or buffer

So how should I pass the value to get the correct output.

The answers for that dupe question aren't all that suitable here: use `str.replace('\W+', '')`. This uses `re.sub` under the hood. — Alex Riley, Oct 21 '15 at 10:59
You can use whatever `lambda` expression you like, such as `lambda x: re.sub(r'\W+', '', x)`. — TigerhawkT3, Oct 21 '15 at 11:01
Rahul, if that duplicate's answers just aren't enough to solve your issue, ping me (include "@TigerhawkT3" in a comment here) and I'll reopen this. — TigerhawkT3, Oct 21 '15 at 11:25
@TigerhawkT3 I was trying to use `re` and I don't know lamda usage here. That's why I posted it. In your given link they were removing only known characters. But your comment `str.replace('\W+', '')` what I wanted. If you post that as answer or lamda function how should I use. It would be grateful. — Rahul Shrivastava, Oct 21 '15 at 11:30
Sure, I'll reopen it and let @ajcr post with his `str.replace` solution. — TigerhawkT3, Oct 21 '15 at 11:32

Amir Imani · Answer 1 · 2020-05-02T11:53:40.210

41

A one liner without map is:

df['E'] = df['B'].str.replace('\W', '')

edited May 02 '20 at 11:53

answered Nov 30 '17 at 19:43

Amir Imani

3,118
2
22
24

8

This is a nice solution for OP's problem, but be aware that it also removes white spaces. If you want to keep spaces use: `df['B'].str.replace('[^\w\s]', '')` – akis Mar 08 '22 at 16:07

score 23 · Accepted Answer · edited May 23 '17 at 10:31

23

As this answer shows, you can use map() with a lambda function that will assemble and return any expression you like:

df['E'] = df['B'].map(lambda x: re.sub(r'\W+', '', x))

lambda simply defines anonymous functions. You can leave them anonymous, or assign them to a reference like any other object. my_function = lambda x: x.my_method(3) is equivalent to def my_function(x): return x.my_method(3).

edited May 23 '17 at 10:31

Community

1
1

answered Oct 21 '15 at 11:39

TigerhawkT3

48,464
6
60
97

Thanks for replying. I will try lamda in future for my queries. – Rahul Shrivastava Oct 21 '15 at 11:41

How to remove special characers from a column of dataframe using module re?

2 Answers2

Linked