0

How to remove the characters from the column which consists of digits and characters?

This is the dataset:

Name
0yrs 0mon
11yrs 11mon 
2yrs 2mon
3yrs 5mon

This is the expected output:

Name
0.0
11.11
2.2
3.5

This is the actual output:

Name
0.0.
11.11.
2.2.
3.5.

I tried by using the command

df.Name = df.Name.str.replace('\D+','.')

3 Answers3

0

I see two trivial ways of doing this with a second pass. One is to simply remove the trailing period. The other is to replace the letter strings separately: a dot if it's separated by a space; remove it otherwise.

df.Name = df.Name.str.replace('\D+ ','.')
df.Name = df.Name.str.replace('\

D+','')

Prune
  • 76,765
  • 14
  • 60
  • 81
0

Try using

regex = re.compile(r"(\d+)\w+ (\d+)\w+")
df.Name = regex.sub(r"\1.\2", df.Name.str)

This link explains the replacement with capturing groups from the regex. The \d+ matches the numbers and the \w+ matches the remaining Unicode word characters immediately afterwards.

StardustGogeta
  • 3,331
  • 2
  • 18
  • 32
  • Yes, i made some similar: ``>>> re.sub('[a-z]{3}\s','.', '11yrs 11mon', 1) '11.11mon' >>> re.sub('[a-z]{3}','', '11.11mon', 1) '11.11' >>> `` – deon cagadoes Jul 25 '19 at 19:01
0

Try chaining rstrip to the end of what you already tried and that should strip out the trailing period:

In [5] df.Name.str.replace('\D+', '.').str.rstrip('.')
Out[5]: 
0      0.0
1    11.11
2      2.2
3      3.5
Name: Name, dtype: object
sjw
  • 6,213
  • 2
  • 24
  • 39
  • Good idea. For that matter, you could just do `[:-1]` to cut off the last character, too. – StardustGogeta Jul 25 '19 at 18:43
  • @StardustGogeta - although it's not explicitly mentioned in the question, my understanding is that `df` is a Pandas DataFrame, so `[:-1]` wouldn't work. – sjw Jul 25 '19 at 18:46
  • I am actually not familiar with Pandas, so I wouldn't know anything about that, but thank you for letting me know. – StardustGogeta Jul 25 '19 at 18:51