Regex on pandas column

Asked Aug 08 '23 at 07:50

Active Aug 08 '23 at 09:56

Viewed 20 times

pandas column has 0.0(nan) and 0(nan). I want to get 0 for both cases. Followed is the code.

import pandas as pd
import re

df = pd.DataFrame.from_dict({'col1': ['0.0(nan)','0(nan)']})
df['col2'] = df['col1'].astype(str).apply(lambda x: re.sub('(.*?)\(nan\)', '\\1', re.sub('(.*?)\.0*\(nan\)', '\\1', x)))
print(df)

Below is the output. For the regex, I didn't know how to deal with either .0 or 0 before the (. This is why I used re.sub inside another re.sub. My question is how to make the regex in one re.sub. Or any other methods? Thank you.

       col1 col2
0  0.0(nan)    0
1    0(nan)    0

Edit: by the comment of @mozway

df['col2'] = df['col1'].astype(str).apply(lambda x: re.sub('(.*?)(?:\.0)?\(nan\)', '\\1', x))

edited Aug 08 '23 at 09:56

asked Aug 08 '23 at 07:50

warem

1,471
2
14
21

1

Use `0(?:\.0)?\(nan\)` to match both – mozway Aug 08 '23 at 07:54
To extract: `str.extract('(\d+)(?:\.0)?\(nan\)')` – mozway Aug 08 '23 at 08:04

Regex on pandas column

0 Answers0