19

Hey I have seen that link but nowhere there they have used re module that's why I have posted here. Hope you understand and remove the duplicate.

Here is the Link. I want to use re module.

Table:

A    B    C    D
1    Q!   W@   2
2    1$   E%   3
3    S2#  D!   4

here I want to remove the special characters from column B and C. I have used .transform() but I want to do it using re if possible but I am getting errors.

Output:

A    B    C    D   E   F
1    Q!   W@   2   Q   W
2    1$   E%   3   1   E
3    S2#  D!   4   S2  D

My Code:

df['E'] = df['B'].str.translate(None, ",!.; -@!%^&*)(")

It's working only if I know what are the special characters.

But I want to use re which would be the best way.

import re
#re.sub(r'\W+', '', your_string)
df['E'] = re.sub(r'\W+', '', df['B'].str)

Here I am getting error:

TypeError: expected string or buffer

So how should I pass the value to get the correct output.

cs95
  • 379,657
  • 97
  • 704
  • 746
Rahul Shrivastava
  • 1,391
  • 3
  • 14
  • 38
  • The answers for that dupe question aren't all that suitable here: use `str.replace('\W+', '')`. This uses `re.sub` under the hood. – Alex Riley Oct 21 '15 at 10:59
  • You can use whatever `lambda` expression you like, such as `lambda x: re.sub(r'\W+', '', x)`. – TigerhawkT3 Oct 21 '15 at 11:01
  • Rahul, if that duplicate's answers just aren't enough to solve your issue, ping me (include "@TigerhawkT3" in a comment here) and I'll reopen this. – TigerhawkT3 Oct 21 '15 at 11:25
  • @TigerhawkT3 I was trying to use `re` and I don't know lamda usage here. That's why I posted it. In your given link they were removing only known characters. But your comment `str.replace('\W+', '')` what I wanted. If you post that as answer or lamda function how should I use. It would be grateful. – Rahul Shrivastava Oct 21 '15 at 11:30
  • Sure, I'll reopen it and let @ajcr post with his `str.replace` solution. – TigerhawkT3 Oct 21 '15 at 11:32

2 Answers2

41

A one liner without map is:

df['E'] = df['B'].str.replace('\W', '')
Amir Imani
  • 3,118
  • 2
  • 22
  • 24
  • 8
    This is a nice solution for OP's problem, but be aware that it also removes white spaces. If you want to keep spaces use: `df['B'].str.replace('[^\w\s]', '')` – akis Mar 08 '22 at 16:07
23

As this answer shows, you can use map() with a lambda function that will assemble and return any expression you like:

df['E'] = df['B'].map(lambda x: re.sub(r'\W+', '', x))

lambda simply defines anonymous functions. You can leave them anonymous, or assign them to a reference like any other object. my_function = lambda x: x.my_method(3) is equivalent to def my_function(x): return x.my_method(3).

Community
  • 1
  • 1
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97