I'm working on a longitudinal dataset where there's multiple data for the same year, but sometime's it's missing. So, using this data:
id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000,
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)
id year marstat
1 1 1999 married
2 1 1999 NA
3 1 2000 married
4 1 2001 married
5 1 2001 divorced
6 2 1999 single
7 2 2000 single
8 2 2001 single
9 2 2001 NA
10 2 2001 NA
11 3 1999 married
12 3 2000 NA
13 3 2001 married
14 3 2002 divorced
15 3 2003 divorced
I want to fill NAs with existing data for that person if there's information about the marital status for that year. So for ID 1, there's an NA in row 2, but there's data for that person for the same year, so I'd want it to say "married" there. Similarly for ID, row 9 and 10, it should say "single" because the person was single in 2001 based on data from row 8.
I don't just want to drop the rows with missingness as in my actual data I have a lot more columns.
I don't want to fill it based on previous/later values. Only if the year in the same.