R: Only keeping the first observation of the month in dataset

Question

I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.

DATE	Asset_1	Asset_2	Asset_3	Asset_4
2000-01-01	1000	300	2900	NA
.....
2000-01-31	1100	350	2950	NA
2000-02-02	1200	330	2970	100
...
2000-02-28	1200	360	3000	200
2000-03-01	1200	370	3500	300

I want to make this into a monthly dataset by only keeping the first observation of the month.

I have come up with the following script:

library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1)

However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).

So when I run the command, those months where the first day of the month doesn't exist are excluded from my dataset.

M.Viking · Accepted Answer · 2023-02-08T15:53:39.063

2

If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:

df<-data.frame(mydate=as.Date("2023-01-01")+1:45)

library(tidyverse)
library(lubridate)

df %>% 
  group_by(ym=paste(year(mydate), month(mydate))) %>% 
  #group_by(year(mydate), month(mydate)) %>% 
  slice_head(n=1)

edited Feb 08 '23 at 15:53

answered Feb 08 '23 at 15:34

M.Viking

5,067
4
17
33

1

the data is indeed always ordered and this did the trick. thanks! – datgoaltho Feb 08 '23 at 15:48

score 2 · Answer 2 · answered Feb 08 '23 at 15:39

2

Use slice_min

library(dplyr) # version 1.1.0 or later
library(zoo)

daily %>% 
  mutate(ym = as.yearmon(DATE)) %>%
  slice_min(DATE, by = ym)

answered Feb 08 '23 at 15:39

G. Grothendieck

254,981
17
203
341

R: Only keeping the first observation of the month in dataset

2 Answers2