I have a dataframe like this:
ID year fcmstat secmstat mstat
138 4 1998 NA NA 1
139 4 1999 NA NA 1
140 4 2000 NA NA 1
141 4 2001 NA NA 1
142 4 2002 NA NA 1
143 4 2003 2 NA 2
144 4 2004 NA NA NA
145 4 2005 NA NA NA
146 4 2006 NA 3 3
147 4 2007 NA NA NA
375 19 2001 NA NA 2
376 19 2002 6 NA 6
377 19 2003 NA NA NA
378 19 2004 NA 5 5
379 19 2005 NA NA NA
380 19 2006 NA NA 1
fcmstat
: type of first marital status change
secmstat
: type of second marital status change
first marital status, for ID 4(19), fsmstat
was changed in 2003(2002) and second marital status secmstat
was changed in 2006(2004). So, for ID 4, in 2004 and 2005 marital status was same as fcmstat
of 2003 and for ID 19, 2003's mstat
should be same as fcmstat
of 2002.
I want to fill in t he last column as follows:
ID year fcmstat secmstat mstat
138 4 1998 NA NA 1
139 4 1999 NA NA 1
140 4 2000 NA NA 1
141 4 2001 NA NA 1
142 4 2002 NA NA 1
143 4 2003 2 NA 2
144 4 2004 NA NA 2
145 4 2005 NA NA 2
146 4 2006 NA 3 3
147 4 2007 NA NA NA
375 19 2001 NA NA 2
376 19 2002 6 NA 6
377 19 2003 NA NA 6
378 19 2004 NA 5 5
379 19 2005 NA NA NA
380 19 2006 NA NA 1
Also, before any first change, the mstat
should be same as before. Consider the following case.
ID year fcmstat secmstat mstat
1171 61 1978 NA NA 0
1172 61 1979 NA NA 0
1173 61 1980 NA NA 0
1174 61 1981 NA NA 0
1175 61 1982 NA NA 0
1176 61 1983 NA NA NA
1177 61 1984 NA NA NA
1178 61 1985 1 NA 1
1179 61 1986 NA NA 1
1180 61 1987 NA NA 1
the first change was in 1985. So, the missing mstat
in 1984 and 1983 should be same as mstat
of 1982. SO for this case, my desired output is:
ID year fcmstat secmstat mstat
1171 61 1978 NA NA 0
1172 61 1979 NA NA 0
1173 61 1980 NA NA 0
1174 61 1981 NA NA 0
1175 61 1982 NA NA 0
1176 61 1983 NA NA 0
1177 61 1984 NA NA 0
1178 61 1985 1 NA 1
1179 61 1986 NA NA 1
1180 61 1987 NA NA 1
As suggested by Schilker the code df$mstat_updated<-na.locf(df$mstat)
gives the following:
ID year fcmstat secmstat mstat mstat_updated
138 4 1998 NA NA 1 1
139 4 1999 NA NA 1 1
140 4 2000 NA NA 1 1
141 4 2001 NA NA 1 1
142 4 2002 NA NA 1 1
143 4 2003 2 NA 2 2
144 4 2004 NA NA NA 2
145 4 2005 NA NA NA 2
146 4 2006 NA 3 3 3
147 4 2007 NA NA NA 3
148 4 2008 NA NA NA 3
However, I do want to fill in mstat for 2004 and 2005 but not in 2007 and 2008. I want to fill in NA's only between first marstat change, fcmstat
and second marstat, secmstat
change.