1

thank you for your time. I have the following data (snippet). Its from longitudinal data, reformed to a wide-format-file of work status, each colum represents one month, each row an individual.

Code:
j1992_12 = c(1, 10, 1, 7, 1, 1)
j1993_01 = c( 1, 1, 1, NA, 3, 1) 
j1993_02 = c( 1, 1, 1, NA, 3, 1) 
j1993_03 = c( 1, 8, 1, NA, 3, 1) 
j1993_04 = c( 1, 8, 1, NA, 3, 1) 
j1993_05 = c( 1, 8, 1, NA, 3, 1) 
j1993_06 = c( 1, 8, 1, NA, 3, 1) 
j1993_07 = c( 1, 8, 1, NA, 3, 1) 
j1993_08 = c( 1, 8, 1, NA, 3, 1) 
j1993_09 = c( 1, 8, 1, NA, 3, 1) 
j1993_10 = c( 1, 8, 1, NA, 3, 1) 
j1993_11 = c( 1, 8, 1, NA, 3, 1) 
j1993_12 = c( 1, 8, 1, NA, 3, 1) 
j1994_01 = c( 1, 8, 1, 7, 3, 1) 


DF93= data.frame(j1992_12, j1993_01, j1993_02, j1993_03, j1993_04, j1993_05, j1993_06, j1993_07, j1993_08, j1993_09, j1993_10, j1993_11, j1993_12, j1994_01)


Output:
       j1992_12   j1993_01 j1993_02 j1993_03 j1993_04 j1993_05 j1993_06 j1993_07 j1993_08 j1993_09 j1993_10 j1993_11 j1993_12 j1994_01
    R1        1          1        1        1        1        1        1        1        1        1        1        1        1        1
    R2       10          1        1        8        8        8        8        8        8        8        8        8        8        8
    R3        1          1        1        1        1        1        1        1        1        1        1        1        1        1
    R4        7         NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       NA        7
    R5        1          3        3        3        3        3        3        3        3        3        3        3        3        3
    R6        1          1        1        1        1        1        1        1        1        1        1        1        1        1

My wish is to check für occurrences of 12 months straight withe "NA" as in line R4. I would like then to check if the last occurence of the year before (j1992_12) has the same value as the first occurence of the year that follows ((j1994_01). If yes I assume there was no change in work status and therefore all 12 months should get the value, that is given in the last month of the year before. If not, all should stay untouched.

Method so far:

DF93_2 = DF93
DF93_2[,2:13] <- ifelse (is.na( DF93[,2:13]) && (DF93[,1]==DF93[,14]), DF93[,1] , DF93[,2:13])

I now see, that if I try it with just a single colum like the code beneath, it replaces the whole column. How to teach R to just replace rowwise?

DF93_2[,2] <- ifelse (is.na( DF93[,2:13]) && (DF93[,1]==DF93[,14]), DF93[,1] , DF93[,2])

If someone could please give me a hint where the flaw in my understanding of R is, I would be very grateful.

EDIT! Only the original file is longitudinal, this format now is WIDE and what I need for a time series analysis. It is already cross-checked with survey data of all years (18 years, beginning 1992 going to 2010) so I would rather not retransform in into long-format an am looking for an possibility with conditions as pointed out above, that I could adjust as the condition differs.

After further testing, I think the problem lies within the search for 12 subsequent NA in a row. I just cannot find a solution to that. If you have any idea, please share. Thank you!

R.bitrary
  • 113
  • 1
  • 9
  • I can't use R where I am so it's only a guess but I'm not sure you can use boolean vectors in the `ifelse` clause. Don't you get only the first value of it ? (eg `if(c(T,F) && c(T,T))` gives `if(T)`) – Vincent Dec 04 '14 at 16:28
  • I'm not sure I understand. Are you trying to do a last observation carried forward (http://en.wikipedia.org/wiki/Analysis_of_clinical_trials#Last_observation_carried_forward)? See http://stackoverflow.com/questions/2776135/last-observation-carried-forward-in-a-data-frame –  Dec 04 '14 at 16:36
  • Do the 12 months need to be consecutive and in the same calendar year and does this need to work for all years? Overall this strikes me as a problem better handled in long form, with a column for year, a column for month, a column for individual, and a column for value. Lots of good tools to work with that kind of data. – farnsy Dec 04 '14 at 16:41
  • @what - na.locf is a not exactly what I intended (wanted to give R a clear condition instead of just reproducing the last non-NA Value) but it as well works within columns instead of rows. So instead inserting a "7" in line 26 it inserts the "1" from line 25 or the "3" from line 27 if run backwards. – R.bitrary Dec 05 '14 at 09:05

3 Answers3

0
EWAZ99_2[,15:26] <- ifelse ( is.na( EWAZ99[,15:26]) & (EWAZ99[,14]==EWAZ99[,27]), EWAZ99[,14] , EWAZ99[,15:26])

I think this is what you are looking for.

anonR
  • 849
  • 7
  • 26
  • Unfortunately, it does nothing to the result (same as my line of code in the question). Line 26 has the same 12 NA in 1993_1-12 as before. – R.bitrary Dec 05 '14 at 09:57
  • Can you share the mentioned lines of data set here because the dummy data frame i build to test it with is giving appropriate results. – anonR Dec 05 '14 at 12:48
  • Sadly it does not work. If there is just one "&" there is an error 72 "variables indicated to replace 12 variables" and otherwise there is no change - I have rewritten the question. – R.bitrary Dec 06 '14 at 15:32
0

Not sure if I understood your right, does something like this help?

naAction <- function(x) {
  if (any(is.na(x))) {
    if (x[1] == x[length(x)]) {
      x[is.na(x)] <- x[1]
    }
  }
  x
}


apply(DF93, 2, naAction)
johannes
  • 14,043
  • 5
  • 40
  • 51
0

Here's one way:

as.data.frame(t(apply(DF93, 1, function(x) 
  if(x[1] == tail(x, 1) && all(is.na(head(x, -1)[-1]))) 
    replace(x, is.na(x), x[1]) else x)))
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • That works well with the sample date. How can I apply it to larger DF? I do not fully understand the "all(is.na(head(x, -1)[-1]" part. Doesn't it stop working, if there are other NA in the row? Thanks a lot so far. – R.bitrary Dec 08 '14 at 09:50
  • @R.bitrary It should work with larger data frames too. What is the exact problem? – Sven Hohenstein Dec 08 '14 at 10:55
  • How do I adress, that it should look for the case, that in a certain part of a row there are only missings, for exampe column 15:26, 27:38, 39:40 etc. - I tried to replace the "all(is.na(head(x, -1)[-1]))) with all(is.na(x[15:26]))) **beacuse this worked withe the sample data** but with the bigger DF (377 rows, 117 variables) I just get the error "missing value where TRUE/FALSE needed " Note: NA make 25% of the DF, means that the if(x[1]-condition can hit a missing value, too – R.bitrary Dec 08 '14 at 12:08