2

In continuity to the following question: Efficient dynamic addition of rows in dataframe and dynamic calculation in R

I have the following table:

Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 19:00,0.03 <-- Gap I
4,21/11/2014 16:00,0.04
5,21/11/2014 17:00,0.06 <-- Gap II
6,21/11/2014 20:00,0.10"

As can be seen there are a gap of 18:00 in 20/11/2014 and two gaps of 18:00 and 19:00 at 21/11/2014. An addition gap is between the days 20/11/2014 19:00 and 21/11/2014 16:00. I would to interpolate (fill in) the value which the gap is up to 3 hours between the rows. The required result should be as followed (in dataframe format):

Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 18:00,0.025<-- Added lines
4,20/11/2014 19:00,0.03
5,21/11/2014 16:00,0.04
6,21/11/2014 17:00,0.06 
6,21/11/2014 18:00,0.073 <--
6,21/11/2014 19:00,0.086 <--
6,21/11/2014 20:00,0.10"

Here is the code I use that fills in the gap between days that is over 3 hours:

library (zoo)
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
interpolated1 <-na.approx(z, xout = seq(start(z), end(z), "hours"))
Community
  • 1
  • 1
Avi
  • 2,247
  • 4
  • 30
  • 52
  • Your results include an interpolated value for `21/11/2014 20:00`, but that point in time was already part of the original data. Do you mean to interpolate only `21/11/2014 18:00` and `21/11/2014 19:00`? – arvi1000 Jan 19 '16 at 21:57

2 Answers2

4

We can merge z with a zero width zoo series z0 which is based on a grid of hours. This will transform z to an hourly series with NAs. Then use the maxgap argument to na.approx as shown below to fill in the desired gaps only. This still leaves NAs in the longer gaps so remove them using na.omit .

fortify.zoo(z3) would transform the result to data frame but since z3, the resulting series with only gaps to length 3 filled, is a time series this is probably not a good idea and it would be better to leave it as a zoo object so that you can use all the facilities of zoo.

No packages other than zoo are used.

z0 <- zoo(, seq(start(z), end(z), "hours"))
z3 <- na.omit(na.approx(merge(z, z0), maxgap = 3))

giving:

> z3
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00 
         0.01000000          0.02000000          0.02500000          0.03000000 
2014-11-21 16:00:00 2014-11-21 17:00:00 2014-11-21 18:00:00 2014-11-21 19:00:00 
         0.04000000          0.06000000          0.07333333          0.08666667 
2014-11-21 20:00:00 
         0.10000000 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Source 1: Creating a specific sequence of date/times in R. Answer by mnel on Sep 13 2012 and edit by Matt Dowle on Sep 13 2012

&

Source 2: Creating regular 15-minute time-series from irregular time-series. Answer by mnel on Sep 13 2012 and edit by Dirk Eddelbuettel on May 3 2012

library(zoo)
library(xts)
library(data.table)
library(devtools)
devtools::install_github("iembry-USGS/ie2misc")
library(ie2misc)
# iembry released a version of ie2misc so you should be able to install
# the package now
# `na.interp1` is a function that combines zoo's `na.approx` and pracma's
# `interp1`

The rest of the code starts after the creation of your z zoo object

## Source 1 begins
startdate <- as.character((start(z)))
# set the start date/time as the 1st entry in the time series and make
# this a character vector.

start <- as.POSIXct(startdate)
# transform the character vector to a POSIXct object

enddate <- as.character((end(z)))
# set the end date/time as the last entry in the time series and make   
# this a character vector.

end <- as.POSIXct(enddate)
# transform the character vector to a POSIXct object

gridtime <- seq(from = start, by = 3600, to = end)
# create a sequence beginning with the start date/time with a 60 minute 
# interval ending at the end date/time
## Source 1 ends

## Source 2 begins
timeframe <- data.frame(rep(NA, length(gridtime)))
# create 1 NA column spaced out by the gridtime to complement the single 
# column of z

timelength <- xts(timeframe, order.by = gridtime)
# create a xts time series object using timeframe and gridtime

zDate <- merge(timelength, z)
# merge the z zoo object and the timelength xts object  
## Source 2 ends

The next steps involve the process of interpolating your data as requested.

Lines <- as.data.frame(zDate)
# to data.frame from zoo

Lines[, "D1"] <- rownames(Lines)
# create column named D1

Lines <- setDT(Lines)
# create data.table out of data.frame

setcolorder(Lines, c(3, 2, 1))
# set the column order as the 3rd column followed by the 2nd and 1st 
# columns

Lines <- Lines[, 3 := NULL]
# remove the 3rd column

setnames(Lines, 2, "diff")
# change the name of the 2nd column to diff

Lines <- setDF(Lines)
# return to data.frame

rowsinterps1 <- which(is.na(Lines$diff == TRUE))
# index of rows of Lines that have NA (to be interpolated)

xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1])
# the Date-Times for diff to be interpolated in numeric format

interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi,
na.rm = FALSE, maxgap = 3)
# the interpolated values where only gap sizes of 3 are filled

Lines[rowsinterps1, 2] <- interps1
# replace the NAs in diff with the interpolated diff values

Lines <- na.omit(Lines) # remove rows with NAs
Lines

This is the Lines data.frame:

Lines
                D1       diff
1  2014-11-20 16:00:00 0.01000000
2  2014-11-20 17:00:00 0.02000000
3  2014-11-20 18:00:00 0.02500000
4  2014-11-20 19:00:00 0.03000000
25 2014-11-21 16:00:00 0.04000000
26 2014-11-21 17:00:00 0.06000000
27 2014-11-21 18:00:00 0.07333333
28 2014-11-21 19:00:00 0.08666667
29 2014-11-21 20:00:00 0.10000000
Community
  • 1
  • 1
iembry
  • 962
  • 1
  • 7
  • 23
  • Thanks @iembry, I get the following errors and Warnings: > xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1]) Warning message: NAs introduced by coercion.... and... > interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi, na.rm = FALSE, maxgap = 3) Error: could not find function "na.interp1" – Avi Jan 16 '16 at 20:09
  • > source("https://raw.githubusercontent.com/iembry-USGS/ie2misc/master/R/na.interp1.R") Error in source("https://raw.githubusercontent.com/iembry-USGS/ie2misc/master/R/na.interp1.R") : https://raw.githubusercontent.com/iembry-USGS/ie2misc/master/R/na.interp1.R:1:2: unexpected input 1: ï» – Avi Jan 16 '16 at 20:22
  • I used the following commands: devtools::install_github("iembry-USGS/ie2misc") library(iembry) and got: > devtools::install_github("iembry-USGS/ie2misc") Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘digest’ > > library(iembry) Error in library(iembry) : there is no package called ‘iembry’ – Avi Jan 16 '16 at 22:00