I am getting my memory to swap with a pretty simple loop and i can't see the Problem. I am working on a tool to clean time series on 10 minutes time steps. It may have gaps of time steps, double time steps and out-of-regular-10-minutes-interval-time-steps. My approach is to generate the "clean" time series first and than match the "good" time steps. After that i would like to check for out-of-regular-10-minutes-interval-time-steps. This is where the problem appears. Sorry for the long code:
Test Data Generation:
rm(list = ls())
Sys.setenv(TZ="Europe/Berlin")
Sys.timezone()
DATE = seq( as.POSIXct("2015-03-28 00:00:00", tz="Europe/Berlin"),
as.POSIXct("2015-04-26 23:00:00", tz="Europe/Berlin"), by = 600)
V1 = round(2*runif(length(DATE)), 2)
DF <- data.frame(DATE, V1)
Adding some "bad" data:
DF2 <- data.frame(DATE= as.POSIXct(c("2015-04-05 05:00:00",
"2015-04-05 05:00:00",
"2015-04-10 10:00:00",
"2015-04-15 15:15:00",
"2015-04-20 20:02:00",
"2015-04-26 23:07:00",
"2015-04-26 23:17:00",
"2015-04-26 23:27:00",
"2015-04-26 23:37:00")),
V1 = c("0.77",
"0.77",
"0.77",
"0.77",
"0.77",
"0.77",
"0.77",
"0.77",
"0.77"))
DF <- rbind(DF, DF2)
DF <- DF[ order(DF$DATE), ]
Defining some time variables and the final "clean" time series:
START_DATE <- as.POSIXct("2015-03-28 00:00:00", tz="Europe/Berlin")
END_DATE <- as.POSIXct("2015-04-26 23:40:00", tz="Europe/Berlin")
tdiff <- difftime("2015-03-28 00:10:00", "2015-03-28 00:00:00",
tz="Europe/Berlin", units = "mins")
DT <- seq( START_DATE, END_DATE, by = 600)
DF_clean <- DF[match(DT,DF$DATE), ]
So long, as you can see the DF_clean looks already pretty good, but the last 4 rows are NAs, since the time steps where out of the regular 10 minutes interval. So i need to look wheather there is any data in between these time steps and shift them to the right 10 minutes interval.
for (var in DT[ which( is.na(DF_clean$DATE))]) {
has.value <- DF$DATE > as.POSIXct(var, origin="1970-01-01") - tdiff &
DF$DATE < as.POSIXct(var, origin="1970-01-01")
DF_clean[as.POSIXct(var, origin="1970-01-01"), ] <- DF[ has.value, ]
}
If i run the content of the for loop manually with var <- "2015-04-26 23:10:00 CEST"
, it works. Running the whole loop leads to the swapping memory. I think it has something to do with the use of POSIXct within the loop and within the [], but I couldn't figure out how to use the - tdiff
otherwise.
I haven't tried any packages yet because I am acctually interested in a base R solution, after I was drawn to avoid any packages here before I don't really understand base R. ;)