Someone asked this already in a simpler version here, but I cannot quite get it to work for my case.
I have observational data on a number of individuals across multiple years for a set of questions, but not everyone is asked every question every year. I want to generate a new dataframe that has the most recent answer for each individual.
The data looks like this:
df <- data.frame(individual = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"), time = c(1:4), questionA = c("Yes", NA, "No", NA, "No", NA, "No", "Yes", "No", NA, NA, "No"), questionB = c(3, 5, 4, 5, 8, 6, 7, 4, 3, 1, 5, NA))
The resulting dataframe for this example should look like this:
most_recent <- data.frame(individual = c("A", "B", "C"), questionA = c("No", "Yes", "No"), questionB = c(5, 4, 5))
Ideally I am looking for a dplyr
solution. Thank you!