How to remove characters from the string?

Question

How to remove the characters from the column which consists of digits and characters?

This is the dataset:

Name
0yrs 0mon
11yrs 11mon 
2yrs 2mon
3yrs 5mon

This is the expected output:

Name
0.0
11.11
2.2
3.5

This is the actual output:

Name
0.0.
11.11.
2.2.
3.5.

I tried by using the command

df.Name = df.Name.str.replace('\D+','.')

Why does the second row become 1.11 rather than 11.11? – sjw Jul 25 '19 at 18:34 — sjw, Jul 25 '19 at 18:34
Sorry for the typo error. I should have written 11.11 – Saurabh Borude Jul 25 '19 at 18:35 — Saurabh Borude, Jul 25 '19 at 18:35

score 0 · Answer 1 · answered Jul 25 '19 at 18:34

I see two trivial ways of doing this with a second pass. One is to simply remove the trailing period. The other is to replace the letter strings separately: a dot if it's separated by a space; remove it otherwise.

df.Name = df.Name.str.replace('\D+ ','.')
df.Name = df.Name.str.replace('\

D+','')

score 0 · Answer 2 · answered Jul 25 '19 at 18:37

0

Try using

regex = re.compile(r"(\d+)\w+ (\d+)\w+")
df.Name = regex.sub(r"\1.\2", df.Name.str)

This link explains the replacement with capturing groups from the regex. The \d+ matches the numbers and the \w+ matches the remaining Unicode word characters immediately afterwards.

answered Jul 25 '19 at 18:37

StardustGogeta

3,331
2
18
32

Yes, i made some similar: ``>>> re.sub('[a-z]{3}\s','.', '11yrs 11mon', 1) '11.11mon' >>> re.sub('[a-z]{3}','', '11.11mon', 1) '11.11' >>> `` – deon cagadoes Jul 25 '19 at 19:01

sjw · Accepted Answer · 2019-07-25T18:45:00.103

0

Try chaining rstrip to the end of what you already tried and that should strip out the trailing period:

In [5] df.Name.str.replace('\D+', '.').str.rstrip('.')
Out[5]: 
0      0.0
1    11.11
2      2.2
3      3.5
Name: Name, dtype: object

edited Jul 25 '19 at 18:45

answered Jul 25 '19 at 18:42

sjw

6,213
2
24
39

Good idea. For that matter, you could just do `[:-1]` to cut off the last character, too. – StardustGogeta Jul 25 '19 at 18:43
@StardustGogeta - although it's not explicitly mentioned in the question, my understanding is that `df` is a Pandas DataFrame, so `[:-1]` wouldn't work. – sjw Jul 25 '19 at 18:46
I am actually not familiar with Pandas, so I wouldn't know anything about that, but thank you for letting me know. – StardustGogeta Jul 25 '19 at 18:51

How to remove characters from the string?

3 Answers3