How to subset dataframe on dates?

Question

I have got a panel dataframe in R with a many rows. I wish to subset the dataframe to only include the last 10 (or last observation 10 days before the end of the month) days of each month. However the months are varying and not all month include end of the month observations. I need a subset of the data to include of every month the final 10 or five days.

CIV50s = CIV50sub %>%
  select(cusip, date, impl_volatility) %>%
  group_by(year(date), month(date), cusip) %>%
  summarize(impl_volatility = tail(impl_volatility, 1)) %>% 
  mutate(date = make_date(`year(date)`, `month(date)`))

I have tried this. However this only gives me the last day of the month observation. I need either the last 10 days or the last observations 10 days before the end of the month.

my dataset looks like this:

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Don't post pictures of data since we can't easily import that into R. Share a small `dput()` of your data instead. — MrFlick, Aug 23 '19 at 15:32
Can your data have more than one observation for a given day? — Kevin Troy, Aug 23 '19 at 15:32
@KevinTroy It only has one observation per given day, per secid (ID). However there are multiple secid, therefore there willbe multiple observations of the same day. — incognito, Aug 23 '19 at 15:39

score 0 · Accepted Answer · answered Aug 23 '19 at 16:09

Here are two possible solutions. The first is quick but imprecise, as you can extract the day of each date and filter those from 21 onward. But this doesn't work precisely since months have different lengths.

library(dplyr)
library(lubridate)

df <- data.frame(t=seq(ymd('2018-01-01'),ymd('2019-01-01'),by='days'))

#extract day of month
df$day <- as.numeric(format(df$t,'%d'))

df %>% filter(day>=20) # can change this to 21 or other number

             t day
1   2018-01-20  20
2   2018-01-21  21
3   2018-01-22  22
4   2018-01-23  23
5   2018-01-24  24
6   2018-01-25  25
7   2018-01-26  26

The other option is to add the length of each month, find the last 10 days, then filter based on the difference. Either option will work if you have missing days for the last days of each month.

df %>% mutate(month=as.numeric(format(t,'%m')),
              month.length=case_when(month %in% c(1,3,5,7,8,10,12)~31,
                         month==2~28,
                         TRUE~30),
              diff=month.length-day) %>% 
  filter(diff<=10)

             t day month month.length diff
1   2018-01-21  21     1           31   10
2   2018-01-22  22     1           31    9
3   2018-01-23  23     1           31    8
4   2018-01-24  24     1           31    7
5   2018-01-25  25     1           31    6
6   2018-01-26  26     1           31    5
7   2018-01-27  27     1           31    4
8   2018-01-28  28     1           31    3
9   2018-01-29  29     1           31    2
10  2018-01-30  30     1           31    1
11  2018-01-31  31     1           31    0
12  2018-02-18  18     2           28   10
13  2018-02-19  19     2           28    9
14  2018-02-20  20     2           28    8
15  2018-02-21  21     2           28    7
16  2018-02-22  22     2           28    6

How to subset dataframe on dates?

1 Answers1