0

Since the dataset is cumulative for each month, I want to keep only the last row of each month for each of the 50 states. The Sample Dataset Snippet here is what the top looks like sorted by name. What functions do I need from tidyverse or dplyr to get it?

speeeZy
  • 1
  • 1
  • 1
    Welcome to Stack Overflow. We cannot copy/paste data from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including example data in a plain text format - for example the output from `dput(yourdata)`. – neilfws Nov 24 '21 at 23:34

1 Answers1

0

Let's use this dummy data looks similar to your data,

dummy <- data.frame(
  name = c("Alabama","Alabama","Alabama","Alabama","Alabama","Alabama"),
  bla = c(1:6),
  as_of_date = c("3/26/2020","3/31/2020","4/6/2020","4/13/2020","4/21/2020","4/28/2020"),
  month = c(3,3,4,4,4,4)
)

     name bla as_of_date month
1 Alabama   1  3/26/2020     3
2 Alabama   2  3/31/2020     3
3 Alabama   3   4/6/2020     4
4 Alabama   4  4/13/2020     4
5 Alabama   5  4/21/2020     4
6 Alabama   6  4/28/2020     4

You may try,

library(dplyr)
dummy %>%
  mutate(as_of_date = as.Date(as_of_date, format = "%m/%d/%Y")) %>%
  arrange(name, as_of_date) %>% # to order by date, state, in case that your data is not ordered as an image
  group_by(name, month) %>%
  filter(row_number() == n())

  name      bla as_of_date month
  <chr>   <int> <date>     <dbl>
1 Alabama     2 2020-03-31     3
2 Alabama     6 2020-04-28     4
Park
  • 14,771
  • 6
  • 10
  • 29