2

I'm trying to backfill a fully outerjoined table with nearest preceding column data.

The data frame I have looks like.. (No rows have both sides as NA and the table is sorted by date).

              date     X         Y
2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121     NA
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121     NA
2012-07-05 04:20:50   0.0121     NA
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121     NA
2012-07-05 20:48:45   0.0121     NA
2012-07-05 23:02:34    NA      0.029
2012-07-05 23:30:45    NA      0.029

with this, I'm looking to..

  1. leave the non-data missing rows as it is.
  2. If either one side is missing (NA), then fill it with the "nearest preceding" row which has valid opposite side's value.

And so as the result, I would like to have the table looking like...

              date     X         Y
2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121   0.027
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121   0.028
2012-07-05 04:20:50   0.0121   0.028
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121   0.027
2012-07-05 20:48:45   0.0121   0.027
2012-07-05 23:02:34   0.0121   0.029
2012-07-05 23:30:45   0.0121   0.029

What kind of R commands can I use to achieve this?

GSee
  • 48,880
  • 13
  • 125
  • 145
beaglebagel
  • 23
  • 1
  • 5

2 Answers2

5

Use na.locf from the zoo package

dat <- read.table(text="2012-07-05 00:01:19   0.0122     NA
2012-07-05 03:19:34   0.0121     NA
2012-07-05 03:19:56   0.0121   0.027
2012-07-05 03:20:31   0.0121     NA
2012-07-05 04:19:56   0.0121   0.028
2012-07-05 04:20:31   0.0121     NA
2012-07-05 04:20:50   0.0121     NA
2012-07-05 04:22:29   0.0121   0.027
2012-07-05 04:24:37   0.0121     NA
2012-07-05 20:48:45   0.0121     NA
2012-07-05 23:02:34    NA      0.029
2012-07-05 23:30:45    NA      0.029")

require("zoo")
na.locf(dat)
#           V1       V2     V3    V4
#1  2012-07-05 00:01:19 0.0122  <NA>
#2  2012-07-05 03:19:34 0.0121  <NA>
#3  2012-07-05 03:19:56 0.0121 0.027
#4  2012-07-05 03:20:31 0.0121 0.027
#5  2012-07-05 04:19:56 0.0121 0.028
#6  2012-07-05 04:20:31 0.0121 0.028
#7  2012-07-05 04:20:50 0.0121 0.028
#8  2012-07-05 04:22:29 0.0121 0.027
#9  2012-07-05 04:24:37 0.0121 0.027
#10 2012-07-05 20:48:45 0.0121 0.027
#11 2012-07-05 23:02:34 0.0121 0.029
#12 2012-07-05 23:30:45 0.0121 0.029
GSee
  • 48,880
  • 13
  • 125
  • 145
  • Thank you for your direction, it does work indeed and for some reason I had specify na.rm=F option and got what I exactly wanted. Thanks again! – beaglebagel Jul 17 '12 at 00:45
2

Maybe you should try this

> library(zoo)
> na.locf(Data)
                x     y
2012-07-17 0.0122  <NA>
2012-07-18 0.0121  <NA>
2012-07-19 0.0121 0.027
2012-07-20 0.0121 0.027
2012-07-21 0.0121 0.028
2012-07-22 0.0121 0.028
2012-07-23 0.0121 0.028
2012-07-24 0.0121 0.027
2012-07-25 0.0121 0.027
2012-07-26 0.0121 0.027
2012-07-27 0.0121 0.029
2012-07-28 0.0121 0.029

where Data is your data.frame

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138