1

I have two time series which I would like to merge: activity is (almost) regular with 5 minute timesteps, the other, temperature, is an irregular time series, with timestamps which do not at any point match the timestamps in activity.

I would like add the column "temperature$temp" to the dataset activity, with NAs for times at which there are no temp records, and the actual temp records assigned to the closest timestamp in activity. (Alternatively to the closest prior or following timestamp).

Previously I used the approxfun function to interpolate the temperature data to match the activity timeseries, however this is not ideal and I would like to include only temperatures which were actually recorded.

I have so far been unable to modify the solutions of similar-seeming timeseries questions posted on stack overflow and elsewhere because they either assume that the timeseries will match at some times, or they aim for output which merges the time series so that the timestamps of both data sets are included, neither of which is the case here.

   activity <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819, 
   1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619, 
   1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419, 
   1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct", 
   "POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300, 
   300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300)), .Names = c("Date", 
   "Activity"), row.names = 1220:1238, class = "data.frame")

   temperature <- structure(list(Date = structure(c(1350543180, 1350547140), class =       c("POSIXct", 
  "POSIXt"), tzone = "GMT"), temp = c(12.625, 12.5)), .Names = c("Date", 
  "temp"), row.names = 2:3, class = "data.frame")

output <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819, 
1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619, 
1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419, 
1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct", 
"POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300, 
300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300), 
temp = c(NA, NA, NA, 12.625, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 12.5, NA)), .Names = c("Date", "Activity", 
"temp"), row.names = 1220:1238, class = "data.frame")

I would greatly appreciate any help or advice you could give me.

DIF
  • 2,470
  • 6
  • 35
  • 49
D Ward
  • 13
  • 3
  • The answers at http://stackoverflow.com/questions/10160400/r-find-nearest-index helped me compose an answer. – Andrie Jul 01 '14 at 08:30

2 Answers2

1

First, define a small helper function that determines the nearest index position:

findNearest <- function(x, y)which.min(abs(x - y))

Then use sapply with findNearest to create an index vector of your temperatures. Use this index vector to create the desired temperature vector

idx <- sapply(activity$Date, findNearest, temperature$Date)
activity$temp <- temperature$temp[idx]

The results:

head(activity)
                    Date Activity   temp
1220 2012-10-18 06:36:59      300 12.625
1221 2012-10-18 06:41:59      300 12.625
1222 2012-10-18 06:46:59      300 12.625
1223 2012-10-18 06:51:59      300 12.625
1224 2012-10-18 06:56:59      300 12.625
1225 2012-10-18 07:01:59      300 12.625


tail(activity)
                    Date Activity temp
1233 2012-10-18 07:41:59      300 12.5
1234 2012-10-18 07:46:59      300 12.5
1235 2012-10-18 07:51:59      300 12.5
1236 2012-10-18 07:56:59      300 12.5
1237 2012-10-18 08:01:59      300 12.5
1238 2012-10-18 08:06:59      300 12.5
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • This solution works, but I really only want the temp column to give a temperature value at the closest time to which it was recorded and have the rest of the column filled with NAs, rather than have a temperature value assigned to every time. – D Ward Jul 03 '14 at 06:07
0

The roll= facility in data.table can do this:

library(data.table)
activity.dt <- data.table(activity, key="Date")[,Date2:=Date]
temperature.dt <- data.table(temperature)

activity.dt[temperature.dt, list(Date=Date2, temp), roll=-Inf][
  activity.dt, list(Date, Activity, temp)]

giving:

                   Date Activity   temp
 1: 2012-10-18 06:36:59      300     NA
 2: 2012-10-18 06:41:59      300     NA
 3: 2012-10-18 06:46:59      300     NA
 4: 2012-10-18 06:51:59      300     NA
 5: 2012-10-18 06:56:59      300 12.625
 6: 2012-10-18 07:01:59      300     NA
 7: 2012-10-18 07:06:59      300     NA
 8: 2012-10-18 07:11:59      207     NA
 9: 2012-10-18 07:16:59        0     NA
10: 2012-10-18 07:21:59        0     NA
11: 2012-10-18 07:26:59        0     NA
12: 2012-10-18 07:31:59        0     NA
13: 2012-10-18 07:36:59      153     NA
14: 2012-10-18 07:41:59      300     NA
15: 2012-10-18 07:46:59      300     NA
16: 2012-10-18 07:51:59      300     NA
17: 2012-10-18 07:56:59      300     NA
18: 2012-10-18 08:01:59      300 12.500
19: 2012-10-18 08:06:59      300     NA
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This solution appears to be ideal; however when I use it, the output temp column only contains NAs. activity.dt[temperature.dt, list(Date=Date2, temp), roll=-Inf] – D Ward Jul 03 '14 at 05:22
  • returns a data.table of the same dimensions as temperature.dt with 3 columns: Date (from temperature.dt), Date (from activity.dt) and temp. However the second part of that function returns activity.dt with an added (empty) Date column and a temp column containing NAs. Somehow the temps from the first part of the function are not being carried across into the second part. – D Ward Jul 03 '14 at 05:31
  • R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.2 loaded via a namespace (and not attached): [1] plyr_1.8.1 Rcpp_0.11.2 reshape2_1.4 stringr_0.6.2 tools_3.1.0 – D Ward Jul 03 '14 at 05:52
  • Try it with data.table 1.9.3 from github: `devtools::install_github("Rdatatable/datatable")` – G. Grothendieck Jul 03 '14 at 10:26