0

this is a really silly question but I cannot figure out what I am doing wrong,

I have a dataframe with multiple individuals where they could have had data recorded over multiple years. I am trying to create a second dataframe to summarize the year that each individual entered my dataset (and ideally when they left, i.e. the first and last year I have data for them)

df1<-data.frame(ID=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
              year=c("2021","2021","2022","2023","2021","2021","2022","2023","2021","2021","2022","2023",
                       "2021","2021","2022","2023","2021","2021","2022","2023"),
                x=c(2,4,5,9,9,7,5,3,2,4,5,9,9,7,5,3,6,8,3,4),
                y=c(2,4,5,9,9,7,5,3,2,4,5,9,9,7,5,3,6,8,3,4))

I have tried to group_by(ID) and then summarize the minimum year the following way:


IDs<-df1 %>% 
  group_by(ID) %>% 
  summarise(strYear=(min(year))

This ends up giving me only one row with the minimum year. I would like a row for each unique ID and then the minimum year corresponding to that ID.

Thanks in advance!

  • 2
    When `mutate` or `summarize` doesn't respect `group_by`, it's almost always a function name conflict. You probably loaded (or some package loaded) the `plyr` package, which has functions with those same names that don't work with `group_by`. You can specify `dplyr::summarise(...)` and/or you can review your work flow and make sure you load `plyr` *before* `dplyr` (if it needs loading at all). – Gregor Thomas Mar 07 '23 at 20:52
  • 1
    I was going to say, your code works for me. I think Gregor explained it well. In addition, you should include your libraries next time you ask a question. – John Polo Mar 07 '23 at 20:54

0 Answers0