I'd like to learn the most idiomatic way to set all values of a data frame to NaN
except the values corresponding to the last business day of the month. I've worked out the following solution but it feels clunky.
If you're wondering what my original use-case is ... I get mixed daily and monthly data into one big data frame. I extract the monthly data which is basically repeating the same value within each month and I'd like to replace the dull repeated values with an interpolated estimation e.g. using loess. To that end I need to fill in missing values for getting all the in-between x-axis NA values.
# get the values corresponding to the last business day of each month
df_eofm = df.resample('BM').last()
# fill the original data frame with NaN's
df[:] = np.nan
# now try to set the last business days to the values we saved
df.update(df_eofm)
print(df)
print(df.dropna())
This produces the expected result:
Col1 Col2 Col3
Date
1963-12-31 57.5 -28 0.89
1964-01-01 NaN NaN NaN
1964-01-02 NaN NaN NaN
1964-01-03 NaN NaN NaN
1964-01-04 NaN NaN NaN
... ... ... ...
2020-03-11 NaN NaN NaN
2020-03-12 NaN NaN NaN
2020-03-13 NaN NaN NaN
2020-03-14 NaN NaN NaN
2020-03-15 NaN NaN NaN
[20530 rows x 3 columns]
Col1 Col2 Col3
Date
1963-12-31 57.5 -28 0.89
1964-01-31 54 106 0.65
1964-02-28 57.1 126 0.68
1964-03-31 57.9 266 0.73
1964-04-30 60.2 144 0.72
... ... ... ...
2019-10-31 47.8 136 0.11
2019-11-29 48.3 128 0.22
2019-12-31 48.1 266 0.37
2020-01-31 47.2 145 -0.08
2020-02-28 50.9 225 -0.45
[675 rows x 3 columns]