0

I have a bunch of 10x2 tables that have missing values sandwiched between dates with existing values. I'm looking for the best way to infer the missing data from previous information. Example:

x1 <- c(1:10)
x2 <- c(NA, 'a', 'a', NA, 'a', 'b', 'b', NA, NA, 'c')
DF <- data.frame(x1,x2)
DF

x1   x2
1 <NA>
2    a
3    a
4 <NA>
5    a
6    b
7    b
8 <NA>
9 <NA>
10    c

I want to find missing values with the following algorithm:

  1. Find the last instance of NA.
  2. Work backwards to replace that NA with the first non-NA. Move to 2nd to last NA (etc.)
  3. If there is no previous NA (as is the case with 1), then work forward to find first non-NA.

So final vector would be

a, a, a, a, a, b, b, b, b, c

I know I can get the list of NAs I want to replace with

Missing = rev(which(is.na(x2)))

and then use a for-loop from there. But I'll admit that I'm not that great of a programmer and would take me a long time to figure out (probably have to brute-force it). Is there a package that can easily sort this out, or a reference manual for these sorts of data clean-up issues?

Nimantha
  • 6,405
  • 6
  • 28
  • 69
CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
  • Possibly related to this post? https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value – Z.Lin Aug 26 '17 at 07:51
  • yup, seems to be a duplicate. personally I find this here to be the easiest to read solution for that problem: https://rdrr.io/cran/tidyr/man/fill.html from tidyr package – Jan Aug 26 '17 at 07:54
  • Sorry about that, I really did try to look up previous entries but the only one I could find a vote of -9. I'll poke around these links, thanks. – CoolGuyHasChillDay Aug 26 '17 at 08:16

1 Answers1

0
library(dplyr)
library(tidyr)
df <- data.frame(x1= c(1:10), x2= c(NA, 'a', 'a', NA, 'a', 'b', 'b', NA, NA, 'c'))
df1 <- df %>% fill(x2)
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Prem
  • 11,775
  • 1
  • 19
  • 33