replacing dots in pandas dataframe

Question

I have a dataframe as in the image. (the numbers are really objects). Doing df.treasury_rate = pd.to_numeric(df.treasury_rate), predictably bombs. However, doing df.replace('.', np.nan) does not appear to get rid of the dot, so I am flummoxed. Any suggestions?

UPDATE pd.to_numeric takes an errors keyword, which, when set to coerce does the right thing, however, I am still confused as to why the . is not getting replaced.

UPDATE 2 As text:

treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 .
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02 
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.02

Please post your data as text, not as an image. We do not want to retype it. — DYZ, Feb 07 '19 at 03:48

score 4 · Answer 1 · answered Feb 07 '19 at 04:02

4

df.replace('\.','0', regex=True,inplace=True)

I think you have to give regex for '.' character to replace it.

answered Feb 07 '19 at 04:02

Nusrath

499
1
4
16

I am confused, still - if regex is not true, why would it (pandas) interpret a dot as a regex? Seems to be a bug... – Igor Rivin Feb 07 '19 at 04:04
It's not a bug, as @cullzie mentions in his answer below, the important part was using the parameter inplace=True otherwise results aren't stored to the df. – Brandon Bertelsen Feb 07 '19 at 04:12
@igor Rivin if regex is not true pandas interpreting it as only '.' char not as dot regex df0.replace('.','0'). where as inplace=True is just making change to your original dataframe , it works like df =df.replace('.','0') – Nusrath Feb 07 '19 at 06:22

score 3 · Answer 2 · answered Feb 07 '19 at 03:57

In this case creating a mask / subset and replacing the string literal would be better than attempting to use replace. You'll probably also want to convert your series to numeric you can also do it all in one step if you know your data is clean other than the "." character:

subs = df.treasury_rate == "."
df.loc[subs, 'treasure_rate'] = np.nan
# OR
df.treasury_rate = pd.to_numeric('treasury_rate', errors='coerce')

You could likely run the last line and have the dots replaced with NaN

cullzie · Answer 3 · 2019-02-07T04:00:37.260

2

I think you just need to add inplace=True if you want to update the df object inplace:

df.replace('.', np.nan, inplace=True)

Otherwise df.replace returns an new dataframe which you must assign to a variable to access:

new_df = df.replace('.', np.nan)

edited Feb 07 '19 at 04:00

answered Feb 07 '19 at 03:55

cullzie

2,705
2
16
21

Gonçalo Peres · Answer 4 · 2022-10-03T23:41:38.417

Given OP's dataframe

import pandas as pd
import datetime as dt

df = pd.DataFrame({'treasury_rate': [4.05, 4.05, 4.05, ".", 4.03, 4.03, 4.02, 4.02, 4.02, 4.0]},
                    index=[dt.datetime(1962, 2, 9), dt.datetime(1962, 2, 10), dt.datetime(1962, 2, 11), dt.datetime(1962, 2, 12), dt.datetime(1962, 2, 13), dt.datetime(1962, 2, 14), dt.datetime(1962, 2, 15), dt.datetime(1962, 2, 16), dt.datetime(1962, 2, 17), dt.datetime(1962, 2, 18)])


[Out]:
           treasury_rate
1962-02-09          4.05
1962-02-10          4.05
1962-02-11          4.05
1962-02-12             .
1962-02-13          4.03
1962-02-14          4.03
1962-02-15          4.02
1962-02-16          4.02
1962-02-17          4.02
1962-02-18           4.0

There are various ways one can go about replacing the dot with NaN.

Using regular expression (with Python's re)

import re

df_new = df.replace(re.compile('\.'), np.nan)

Using a custom lambda function and np.nan

import numpy as np

df_new = df.treasury_rate.apply(lambda x: np.nan if x == '.' else x)

Using pandas.DataFrame.replace and np.nan

import numpy as np

df_new = df.replace('.', np.nan)

# or

df_new = df.replace(to_replace=r'\.', value=np.nan, regex=True)

In all three cases the output is

            treasury_rate
1962-02-09           4.05
1962-02-10           4.05
1962-02-11           4.05
1962-02-12            NaN
1962-02-13           4.03
1962-02-14           4.03
1962-02-15           4.02
1962-02-16           4.02
1962-02-17           4.02
1962-02-18           4.00

Notes:

If one wants to do the operations on the same dataframe, one can use inplace=True such as
```
df.replace(re.compile('\.'), np.nan)
```

Pandas has their own null value. So, instead of np.nan, one might want to use pd.NA. For more information on the difference between those two: pd.NA vs np.nan for pandas If one uses pd.NA, the output would be

           treasury_rate
1962-02-09          4.05
1962-02-10          4.05
1962-02-11          4.05
1962-02-12          <NA>
1962-02-13          4.03
1962-02-14          4.03
1962-02-15          4.02
1962-02-16          4.02
1962-02-17          4.02
1962-02-18           4.0

There are strong opinions on using .apply. If one wants to read some: When should I (not) want to use pandas apply() in my code?

replacing dots in pandas dataframe

4 Answers4