I have a data.table
, and a list of dates. I wish to filter and amend the rows using a function that checks to see if the dates against the list.
# example data
set.seed(1)
tt <- sample(
seq(as.POSIXct("2011-10-02"), as.POSIXct("2014-04-06"),
by = "day"), 10)
IR1 <- data.table(tstamp = sort(tt), dLoc = 1L:10L)
List of dates:
DLSlist <- lapply(
list(dls11t12 = c("2011-10-02", "2012-04-01"),
dls12t13 = c("2012-10-07", "2013-04-07"),
dls13t14 = c("2013-10-06", "2014-04-06"),
dls14t15 = c("2014-10-05", "2015-04-05"),
dls15t16 = c("2015-10-04", "2016-04-03"),
dls16t17 = c("2016-10-02", "2017-04-02")
),
function(X) as.POSIXct(X)
)
I would like to transform dLoc
if it falls inside any of the date ranges in DLSlist
. I can do it the long way as follows:
IR1[tstamp > DLSlist[[1]][1] & tstamp < DLSlist[[1]][2], tstamp := tstamp + 60*60]
IR1[tstamp > DLSlist[[2]][1] & tstamp < DLSlist[[2]][2], tstamp := tstamp + 60*60]
IR1[tstamp > DLSlist[[3]][1] & tstamp < DLSlist[[3]][2], tstamp := tstamp + 60*60]
However that seems error-prone: a function is suited to this task ... mine didn't work.
DLStest <- function(dd, DLSobj) {
any(sapply(DLSobj, function(X) dd %between% X))
}
I applied it with:
IR1[DLStest(tstamp, DLSlist), tstamp := tstamp + 60*60]
However it didn't work: all of the rows were transformed (not only the ones inside the ranges, as had been the case in my ugly hack code).
Is there some means of selecting rows using a function -- or some other means of selecting rows based upon multiple range checks?
Update (with thanks to Frank, who spotted the issue)
You can indeed filter with a function that returns a vector or booleans. The error was all with my initial function.
DLStest_old <- function(dd, DLSobj) {
any(sapply(DLSobj, function(X) dd %between% X))
}
sapply
returns an object who's class
is matrix
; any
checks to see if there are any true values in the entire matrix
. If there are any true values is evaluates to a single TRUE
. If not, it evaluates to a single FALSE
.
Using the test data:
(IR1[DLStest_old(tstamp, DLSlist), dLoc := dLoc + 1000L])
tstamp dLoc
1: 2011-11-27 01:00:00 1001
2: 2012-04-03 00:00:00 1002
3: 2012-06-01 00:00:00 1003
4: 2012-09-06 00:00:00 1004
5: 2013-03-09 01:00:00 1005
6: 2013-04-25 00:00:00 1006
7: 2013-05-25 00:00:00 1007
8: 2013-12-29 01:00:00 1008
9: 2014-01-09 01:00:00 1009
10: 2014-02-08 01:00:00 1010
The fix is to test separately for each row of the matrix, using apply
.
DLStest <- function(dd, DLSobj) {
apply(sapply(DLSobj, function(X) dd %between% X), 1, any)
}
This now works:
> (IR1[DLStest(tstamp, DLSlist), dLoc := dLoc + 1000L])
tstamp dLoc
1: 2011-11-27 01:00:00 1001
2: 2012-04-03 00:00:00 2
3: 2012-06-01 00:00:00 3
4: 2012-09-06 00:00:00 4
5: 2013-03-09 01:00:00 1005
6: 2013-04-25 00:00:00 6
7: 2013-05-25 00:00:00 7
8: 2013-12-29 01:00:00 1008
9: 2014-01-09 01:00:00 1009
10: 2014-02-08 01:00:00 1010