0

I have a dataframe like this:

    ID  year    fcmstat secmstat  mstat
138 4   1998    NA          NA      1
139 4   1999    NA          NA      1
140 4   2000    NA          NA      1
141 4   2001    NA          NA      1
142 4   2002    NA          NA      1
143 4   2003    2           NA      2
144 4   2004    NA          NA      NA
145 4   2005    NA          NA      NA
146 4   2006    NA          3       3
147 4   2007    NA          NA     NA

375 19  2001    NA          NA      2
376 19  2002    6           NA      6
377 19  2003    NA          NA      NA
378 19  2004    NA          5       5
379 19  2005    NA          NA      NA
380 19  2006    NA          NA      1


fcmstat: type of first marital status change

secmstat: type of second marital status change

first marital status, for ID 4(19), fsmstat was changed in 2003(2002) and second marital status secmstat was changed in 2006(2004). So, for ID 4, in 2004 and 2005 marital status was same as fcmstat of 2003 and for ID 19, 2003's mstat should be same as fcmstat of 2002.

I want to fill in t he last column as follows:

    ID  year    fcmstat secmstat  mstat
138 4   1998    NA          NA      1
139 4   1999    NA          NA      1
140 4   2000    NA          NA      1
141 4   2001    NA          NA      1
142 4   2002    NA          NA      1
143 4   2003    2           NA      2
144 4   2004    NA          NA      2
145 4   2005    NA          NA      2
146 4   2006    NA          3       3
147 4   2007    NA          NA     NA

375 19  2001    NA          NA      2
376 19  2002    6           NA      6
377 19  2003    NA          NA      6
378 19  2004    NA          5       5
379 19  2005    NA          NA      NA
380 19  2006    NA          NA      1

Also, before any first change, the mstatshould be same as before. Consider the following case.

        ID  year    fcmstat secmstat    mstat
1171    61  1978    NA      NA           0
1172    61  1979    NA      NA           0
1173    61  1980    NA      NA           0
1174    61  1981    NA      NA           0
1175    61  1982    NA      NA           0
1176    61  1983    NA      NA          NA
1177    61  1984    NA      NA          NA
1178    61  1985    1       NA           1
1179    61  1986    NA      NA           1
1180    61  1987    NA      NA           1

the first change was in 1985. So, the missing mstat in 1984 and 1983 should be same as mstat of 1982. SO for this case, my desired output is:

       ID   year    fcmstat secmstat    mstat
1171    61  1978    NA      NA           0
1172    61  1979    NA      NA           0
1173    61  1980    NA      NA           0
1174    61  1981    NA      NA           0
1175    61  1982    NA      NA           0
1176    61  1983    NA      NA           0
1177    61  1984    NA      NA           0
1178    61  1985    1       NA           1
1179    61  1986    NA      NA           1
1180    61  1987    NA      NA           1

As suggested by Schilker the code df$mstat_updated<-na.locf(df$mstat) gives the following:

    ID  year    fcmstat secmstat  mstat mstat_updated
138 4   1998    NA          NA      1     1
139 4   1999    NA          NA      1     1
140 4   2000    NA          NA      1     1
141 4   2001    NA          NA      1     1
142 4   2002    NA          NA      1     1
143 4   2003    2           NA      2     2 
144 4   2004    NA          NA      NA     2   
145 4   2005    NA          NA      NA     2
146 4   2006    NA          3       3     3 
147 4   2007    NA          NA     NA     3
148 4   2008    NA          NA     NA     3

However, I do want to fill in mstat for 2004 and 2005 but not in 2007 and 2008. I want to fill in NA's only between first marstat change, fcmstat and second marstat, secmstat change.

chfs
  • 57
  • 6
  • Looks like all the values needed are already in `mstat`. I recommend reading the help page for [zoo::na.locf](https://www.rdocumentation.org/packages/zoo/versions/1.8-6/topics/na.locf) – Jonny Phelps Oct 29 '19 at 16:50
  • I have also provided my desired outcome. That's why you see in the 2nd and the 4th tables, NA's are replaced with values. Thanks so much for your reference to zoo::na.locf . – chfs Oct 30 '19 at 04:08

1 Answers1

0

As I mentioned in my comment this a duplicate of here

library(zoo)

df<-data.frame(ID=c('4','4','4','4'),
              year=c(2003,2004,2005,2006),
              mstat=c(2,NA,NA,3))
df$mstat<-na.locf(df$mstat)
Schilker
  • 505
  • 2
  • 11
  • Thank you so much for your response. This is close but not exactly what I want. I have added further details in my original post. Any help will be much appreciated. – chfs Oct 30 '19 at 04:03