pandas - Remove a particular character as well as the previous and subsequent characters

Question

I have translated Bengali phonetics into English. But after parsing, I got some trash characters, which I want to remove. My data frame looks like this.

col1        
utto্tor        
dokkho্shin     
muuns্si

So I want to remove the trash character along with its previous and following character as well. For example: In the first row, I want to remove ্ - this character and also the character o and t, which is the adjacent of ্ (this) character.

My desired output is looks like the following-

col1            col2
utto্tor        uttor
dokkho্shin     dokkhhin
muuns্si        muuni

P.S. I have got these kind of character by using Avro parser which looks like below:

reversed_text = avro.reverse("উত্তর")
print(reversed_text)

output: utto্tor

col0        col1
উত্তর       utto্tor
দক্ষিণ      dokkho্shin
মুন্সী         muuns্si

score 1 · Accepted Answer · answered Nov 03 '22 at 09:45

1

You can use str.replace removing all non ascii characters and the characters before/after them:

df['col2'] = df['col1'].str.replace(r'.[^\x00-\x7F].', '', regex=True)

output:

         col1      col2
0     utto্tor     uttor
1  dokkho্shin  dokkhhin
2     muuns্si     muuni

answered Nov 03 '22 at 09:45

mozway

194,879
13
39
75

thanks! Would you mind explaining which part of the regex removes the characters before/after? @mozway – asif abdullah Nov 03 '22 at 10:53
@asif the two periods `.` – mozway Nov 03 '22 at 10:54

score 0 · Answer 2 · answered Nov 03 '22 at 09:40

0

The pandas str accessor should provide you the required functionality. https://pandas.pydata.org/docs/reference/api/pandas.Series.str.html

Example:

import pandas as pd

df = pd.DataFrame({'Col1': ['Text1', 'Text2']})
df['Col1'] = df['Col1'].str.replace("Text", "newText")
df

It allows also the use of regular expressions.

answered Nov 03 '22 at 09:40

Marcel Flygare

837
10
19

my "newText' isn't always static. So I can't pre-define the characters. – asif abdullah Nov 03 '22 at 10:56
@mozway 's answer does the trick for the concrete question. – Marcel Flygare Nov 03 '22 at 11:00

pandas - Remove a particular character as well as the previous and subsequent characters

2 Answers2