The intention is to summarize the Duration column of the datatable by applying: sum, max, mode, min, and count The mode function I use is the one shown in How to find the statistical mode? and the one on the package DescTools.
the data used
library(data.table)
dt<-data.table(
stringsAsFactors = FALSE,
ODUFault = c("NO","SI","NO","SI","NO",
"SI","NO","SI","NO","SI","NO","SI","NO","SI","NO",
"SI","NO","SI","NO","SI"),
LastFault = c("sA","sB","sB","sB","sB",
"sB","sB","sB","sB","sB","sB","sC","sC","sB","sB",
"sB","sB","sB","sB","sB"),
SubFlt = c("A","B","B","B","B","B",
"B","B","B","B","B","C","C","B","B","B","B","B",
"B","B"),
Duration = c("00:09:40","00:03:01",
"00:06:58","00:03:00","00:06:59","00:03:00","00:06:58",
"00:03:01","00:06:59","00:02:59","00:07:29","00:03:01",
"00:06:29","00:05:03","00:04:56","00:03:00","00:06:59",
"00:02:59","00:07:00","00:15:33")
)
When performing the summary using the median function, all outputs have the format: "H: M: S"
dt[, Duration:=as.ITime(Duration)]
Summarize_SubFlt=dt[,list(g = sum(Duration),m=max(Duration),md=median(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
SubFlt ODUFault g m md n c
1: B SI 00:41:36 00:15:33 00:03:00 00:02:59 9
2: C SI 00:03:01 00:03:01 00:03:01 00:03:01 1
When using the mode function, all outputs lose the format: "H: M: S", except the output of the mode function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
Summarize_SubFlt2=dt[,list(g = sum(Duration),m=max(Duration),md=getmode(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
SubFlt ODUFault g m md n c
1: B SI 2496 933 00:03:00 179 9
2: C SI 181 181 00:03:01 181 1
#Structure of Summarize_SubFlt
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : 'ITime' int 00:41:36 00:03:01
$ m : 'ITime' int 00:15:33 00:03:01
$ md : 'ITime' num 00:03:00 00:03:01
$ n : 'ITime' int 00:02:59 00:03:01
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
#Structure of Summarize_SubFlt2
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : int 2496 181
$ m : int 933 181
$ md : 'ITime' int 00:03:00 00:03:01
$ n : int 179 181
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
#Structure of Summarize_SubFlt3 using Mode from library(DescTools)
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : 'ITime' int 00:41:36 00:03:01
$ m : 'ITime' int 00:15:33 00:03:01
$ md : 'ITime' num 00:03:00 00:03:01
$ n : 'ITime' int 00:02:59 00:03:01
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
How to keep the Format "H% M% S" for all summary outputs?