Given OP's dataframe
import pandas as pd
import datetime as dt
df = pd.DataFrame({'treasury_rate': [4.05, 4.05, 4.05, ".", 4.03, 4.03, 4.02, 4.02, 4.02, 4.0]},
index=[dt.datetime(1962, 2, 9), dt.datetime(1962, 2, 10), dt.datetime(1962, 2, 11), dt.datetime(1962, 2, 12), dt.datetime(1962, 2, 13), dt.datetime(1962, 2, 14), dt.datetime(1962, 2, 15), dt.datetime(1962, 2, 16), dt.datetime(1962, 2, 17), dt.datetime(1962, 2, 18)])
[Out]:
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 .
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.0
There are various ways one can go about replacing the dot with NaN.
Using regular expression (with Python's re
)
import re
df_new = df.replace(re.compile('\.'), np.nan)
Using a custom lambda function and np.nan
import numpy as np
df_new = df.treasury_rate.apply(lambda x: np.nan if x == '.' else x)
Using pandas.DataFrame.replace
and np.nan
import numpy as np
df_new = df.replace('.', np.nan)
# or
df_new = df.replace(to_replace=r'\.', value=np.nan, regex=True)
In all three cases the output is
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 NaN
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.00
Notes:
If one wants to do the operations on the same dataframe, one can use inplace=True
such as
df.replace(re.compile('\.'), np.nan)
Pandas has their own null value. So, instead of np.nan
, one might want to use pd.NA
. For more information on the difference between those two: pd.NA vs np.nan for pandas
If one uses pd.NA
, the output would be
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 <NA>
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.0
There are strong opinions on using .apply
. If one wants to read some: When should I (not) want to use pandas apply() in my code?