There are several ways to achieve the desired result using dcast
. jazzurro's solution does the aggregation before reshaping the result. The approaches here use dcast
directly but may require some post-processing. We are using jazzurro's data which are tweaked to obey the UTC
time zone and CRAN version 1.10.0 of data.table
.
1. Getting ifelse
to work
As reported in the Q,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) ifelse(type == 'timeIn', min(x), max(x))
)
returns an error message. The full text of the error message includes the hint to use the fill
parameter. Unfortunately, ifelse()
doesn't respect the POSIXct
class (for details see ?ifelse
) so this needs to be enforced.
With
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x)
lubridate::as_datetime(ifelse(type == 'timeIn', min(x), max(x))),
fill = 0
)
we do get
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
2. Alternative to ifelse
ifelse
's help page suggests
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
as alternative. Following this advice,
dcast(
dt, person + door ~ type,
value.var = 'time',
fun.aggregate = function(x) {
test <- type == "timeIn"; tmp <- min(x); tmp[!test] = max(x)[!test]; tmp
}
)
returns
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
Note that neither the fill
parameter nor the coercion to POSIXct
is needed.
3. Using enhanced dcast
With the latest versions of dcast.data.table
we can provide a list of functions to fun.aggregate
:
dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))
returns
# person door time_min_timeIn time_min_timeOut time_max_timeIn time_max_timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:03 2016-12-02 07:06:02 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:03 2016-12-02 06:05:02 2016-12-02 06:05:05
We can remove the unwanted columns and rename the others by
dcast(dt, person + door ~ type, value.var = 'time', fun = list(min, max))[
, .(person, door, timeIn = time_min_timeIn, timeOut = time_max_timeOut)]
which gets us
# person door timeIn timeOut
#1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05
Data
As mentioned above, we are using jazzurro's data
dt <- structure(list(person = c("bob", "bob", "bob", "bob", "ana",
"ana", "ana", "ana"), door = c("front door", "front door", "front door",
"front door", "front door", "front door", "front door", "front door"
), type = c("timeIn", "timeIn", "timeOut", "timeOut", "timeIn",
"timeIn", "timeOut", "timeOut"), time = structure(c(1480658701,
1480658702, 1480658703, 1480658705, 1480662361, 1480662362, 1480662363,
1480662365), class = c("POSIXct", "POSIXt"))), .Names = c("person",
"door", "type", "time"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
but coerce the time zone to UTC
.
With
dt[, time := lubridate::with_tz(time, "UTC")]
we have
dt
# person door type time
#1: bob front door timeIn 2016-12-02 06:05:01
#2: bob front door timeIn 2016-12-02 06:05:02
#3: bob front door timeOut 2016-12-02 06:05:03
#4: bob front door timeOut 2016-12-02 06:05:05
#5: ana front door timeIn 2016-12-02 07:06:01
#6: ana front door timeIn 2016-12-02 07:06:02
#7: ana front door timeOut 2016-12-02 07:06:03
#8: ana front door timeOut 2016-12-02 07:06:05
independent of local time zone.