Since the dataset is cumulative for each month, I want to keep only the last row of each month for each of the 50 states. The Sample Dataset Snippet here is what the top looks like sorted by name. What functions do I need from tidyverse or dplyr to get it?
Asked
Active
Viewed 42 times
0
-
1Welcome to Stack Overflow. We cannot copy/paste data from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including example data in a plain text format - for example the output from `dput(yourdata)`. – neilfws Nov 24 '21 at 23:34
1 Answers
0
Let's use this dummy
data looks similar to your data,
dummy <- data.frame(
name = c("Alabama","Alabama","Alabama","Alabama","Alabama","Alabama"),
bla = c(1:6),
as_of_date = c("3/26/2020","3/31/2020","4/6/2020","4/13/2020","4/21/2020","4/28/2020"),
month = c(3,3,4,4,4,4)
)
name bla as_of_date month
1 Alabama 1 3/26/2020 3
2 Alabama 2 3/31/2020 3
3 Alabama 3 4/6/2020 4
4 Alabama 4 4/13/2020 4
5 Alabama 5 4/21/2020 4
6 Alabama 6 4/28/2020 4
You may try,
library(dplyr)
dummy %>%
mutate(as_of_date = as.Date(as_of_date, format = "%m/%d/%Y")) %>%
arrange(name, as_of_date) %>% # to order by date, state, in case that your data is not ordered as an image
group_by(name, month) %>%
filter(row_number() == n())
name bla as_of_date month
<chr> <int> <date> <dbl>
1 Alabama 2 2020-03-31 3
2 Alabama 6 2020-04-28 4

Park
- 14,771
- 6
- 10
- 29