0

I have written R code that merges two data frames based on first column and for missing data adds the value from above. Here is what is does:

Two input data frames:

1 a
2 b
3 c
5 d

And

1 e
4 f
6 g

My code gives this output:

   1 a e
   2 b e
   3 c e
   4 c f
   5 d f
   6 d g

My code is however inefficient as it is not vectorized properly. Are there some R functions which I could use? Basically a function I am looking for is that fills in missing values / NA values and takes the value from previous element and puts it in place of NA.

I looked through reference book of R, but could not find anything.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Dmitrii I.
  • 696
  • 7
  • 16

1 Answers1

2

Here is a solution making use of zoo::na.locf

library(zoo)

a <- data.frame(id=c(1,2,3,5), v=c("a","b","c", "d"))
b <- data.frame(id=c(1,4,6), v=c("e", "f", "g"))

n <- max(c(a$id, b$id))

an <- merge(data.frame(id=1:n), a, all.x=T)
bn <- merge(data.frame(id=1:n), b, all.x=T)

an$v <- na.locf(an$v)
bn$v <- na.locf(bn$v)


data.frame(an$id, an$v, bn$v)
      an.id an.v bn.v
1     1    a    e
2     2    b    e
3     3    c    e
4     4    c    f
5     5    d    f
6     6    d    g
johannes
  • 14,043
  • 5
  • 40
  • 51
  • You sould directly merge the 2 data.frame's with all=T and sort=F, and not create the full sequence of ids (because you could create unexisting ids). – digEmAll Oct 12 '12 at 13:21