Filling missing values based on a condition

Question

I'm working on a longitudinal dataset where there's multiple data for the same year, but sometime's it's missing. So, using this data:

id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000, 
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)

   id year  marstat
1   1 1999  married
2   1 1999     NA
3   1 2000  married
4   1 2001  married
5   1 2001 divorced
6   2 1999   single
7   2 2000   single
8   2 2001   single
9   2 2001     NA
10  2 2001     NA
11  3 1999  married
12  3 2000     NA
13  3 2001  married
14  3 2002 divorced
15  3 2003 divorced

I want to fill NAs with existing data for that person if there's information about the marital status for that year. So for ID 1, there's an NA in row 2, but there's data for that person for the same year, so I'd want it to say "married" there. Similarly for ID, row 9 and 10, it should say "single" because the person was single in 2001 based on data from row 8.

I don't just want to drop the rows with missingness as in my actual data I have a lot more columns.

I don't want to fill it based on previous/later values. Only if the year in the same.

It is not clear whether OP wants to use dplyr, and also not clear whether he wants to use information from only previous and next values or the entire DF. Seems like a bad choice of duplicate. — erocoar, Apr 30 '18 at 11:18
Have edited it to make it clear it's not a duplicate. I don't want to fill it with previous or next values, I want to fill them conditionally based on year. — T.P., Apr 30 '18 at 12:18
`df$marstat[is.na(df$marstat)] <- apply(df[is.na(df$marstat), ], 1, function(x) { select <- df[df$id == x["id"] & df$year == x["year"], "marstat"] select[!is.na(select)][1] } )` this should work for you — erocoar, Apr 30 '18 at 12:29

score 0 · Accepted Answer · answered Apr 30 '18 at 11:08

you can try

library(tidyverse)
df %>% 
  group_by(id, year) %>% 
  mutate(marstat2=paste(na.omit(marstat), collapse = ","),
         marstat3=case_when(is.na(marstat) ~  marstat2, 
                            TRUE ~ as.character(marstat)))
# A tibble: 15 x 5
# Groups:   id, year [11]
   id     year marstat  marstat2         marstat3
   <fct> <dbl> <fct>    <chr>            <chr>   
 1 1     1999. married  married          married 
 2 1     1999. NA       married          married 
 3 1     2000. married  married          married 
 4 1     2001. married  married,divorced married 
 5 1     2001. divorced married,divorced divorced
 6 2     1999. single   single           single  
 7 2     2000. single   single           single  
 8 2     2001. single   single           single  
 9 2     2001. NA       single           single  
10 2     2001. NA       single           single  
11 3     1999. married  married          married 
12 3     2000. NA       ""               ""      
13 3     2001. married  married          married 
14 3     2002. divorced divorced         divorced
15 3     2003. divorced divorced         divorced

Added different columns to show the oppertunities of that approach.

Thank you. It doesn't appear to be working on my actual data, however, and I think that's because I have 5 marstats in the actual data. Sorry, should have made that clear in the original question. — T.P., Apr 30 '18 at 12:14

Filling missing values based on a condition

1 Answers1