In a data.frame, I would like to create a function that does the following for each row:
- Retains NA values prior to the first non-NA
- After the first non-NA value, fills forward" NA's with the closest previous non-NA value
- Replaces all of the original non-NA values with NAs
I realize that step 2 can be accomplished with the na.locf() function in the 'zoo' package, but I'm unsure about how to write a function that can "recall" which values were originally non-NAs, so that I can replace them with NAs in the last step. Similarly, identifying the value that is the first or last non-NA within each row is straight forward, but it the middle values that have me at a loss. Here's an example with code
#Example input
dm <- data.frame(rbind(c(NA,1,NA,NA,2,NA,NA,3),
c(1,1,NA,2,NA,3,3,3),
c(NA,NA,5,NA,NA,NA,6,NA)))
#Desired output
dm2 <- data.frame(rbind(c(NA,NA,1,1,NA,2,2,NA),
c(NA,NA,1,NA,2,NA,NA,NA),
c(NA,NA,NA,5,5,5,NA,6)))
> dm
X1 X2 X3 X4 X5 X6 X7 X8
1 NA 1 NA NA 2 NA NA 3
2 1 1 NA 2 NA 3 3 3
3 NA NA 5 NA NA NA 6 NA
> dm2
X1 X2 X3 X4 X5 X6 X7 X8
1 NA NA 1 1 NA 2 2 NA
2 NA NA 1 NA 2 NA NA NA
3 NA NA NA 5 5 5 NA 6
A little more about my data— it's composed of whole integers or NA values, as shown. Within each row, the numeric values will either stay the same, increase, or be NA, but never decrease. The number of non-NA values could theoretically vary from 1 to ncol.
I realize this is a rather specific question, any suggestions or help is much appreciated!