What's the most idiomatic way to set all values to NaN except the end of the month?

Question

I'd like to learn the most idiomatic way to set all values of a data frame to NaN except the values corresponding to the last business day of the month. I've worked out the following solution but it feels clunky.

If you're wondering what my original use-case is ... I get mixed daily and monthly data into one big data frame. I extract the monthly data which is basically repeating the same value within each month and I'd like to replace the dull repeated values with an interpolated estimation e.g. using loess. To that end I need to fill in missing values for getting all the in-between x-axis NA values.

# get the values corresponding to the last business day of each month
df_eofm = df.resample('BM').last()
# fill the original data frame with NaN's
df[:] = np.nan
# now try to set the last business days to the values we saved
df.update(df_eofm)
print(df)
print(df.dropna())

This produces the expected result:

                        Col1                    Col2            Col3
Date                                                                
1963-12-31              57.5                     -28            0.89
1964-01-01               NaN                     NaN             NaN
1964-01-02               NaN                     NaN             NaN
1964-01-03               NaN                     NaN             NaN
1964-01-04               NaN                     NaN             NaN
...                      ...                     ...             ...
2020-03-11               NaN                     NaN             NaN
2020-03-12               NaN                     NaN             NaN
2020-03-13               NaN                     NaN             NaN
2020-03-14               NaN                     NaN             NaN
2020-03-15               NaN                     NaN             NaN

[20530 rows x 3 columns]
                        Col1                    Col2            Col3
Date                                                                
1963-12-31              57.5                     -28            0.89
1964-01-31                54                     106            0.65
1964-02-28              57.1                     126            0.68
1964-03-31              57.9                     266            0.73
1964-04-30              60.2                     144            0.72
...                      ...                     ...             ...
2019-10-31              47.8                     136            0.11
2019-11-29              48.3                     128            0.22
2019-12-31              48.1                     266            0.37
2020-01-31              47.2                     145           -0.08
2020-02-28              50.9                     225           -0.45

[675 rows x 3 columns]

yatu · Accepted Answer · 2020-06-14T18:40:23.553

2

You could use is_month_end and index the dataframe with the resultant boolean series:

df[~df.index.is_month_end] = np.nan

For the last business day, using this answer we could do something like:

def is_business_day(date):
    return bool(len(pd.bdate_range(date, date)))

last_bus = (df.index.to_frame()[0]    
              .map(is_business_day)
              .groupby(df.index.month)
              .transform(lambda x: x.last_valid_index()))
df[df.index==last_bus] = np.nan

edited Jun 14 '20 at 18:40

answered Jun 14 '20 at 18:25

yatu

86,083
12
84
139

1

it is excellent answer +1. But it seems OP looking for `... the last business day of the month...`. This is normal month end – Andy L. Jun 14 '20 at 18:31
@AndyL. exactly good point! though it produces exactly the same result as the clunky approach it would be nice to know the technically exact equivalent to `'BM'` – SkyWalker Jun 14 '20 at 18:34
@SkyWalker: this is an succinct and nice answer. This flags regular month-end such as 1/31, 28(or 29)/2, 31/3... In the other hand, business day is from Mon - Fri, so assume on some year 31/3 falls on Sunday, `the last business day of the month` would be 29/3. That is the difference – Andy L. Jun 14 '20 at 18:41
1

I think the update should do, thx for pointing out @andy – yatu Jun 14 '20 at 18:42
1

@yatu: yeah, the 2nd one does :). I already upvoted. Otherwise, I would upvote it again. – Andy L. Jun 14 '20 at 18:44

What's the most idiomatic way to set all values to NaN except the end of the month?

1 Answers1