My dataset has a number of duration columns that have both positive and negative values, like this:
require(lubridate)
set.seed(43)
dt=data.table(A=as.duration(rnorm(100)*100), B=as.duration(rnorm(100)*100), C=as.duration(rnorm(100)*100))
I need to generate summaries for each column, but ONLY for the negative values in each, like this:
a=dt[as.numeric(A)<=0, summary(as.numeric(A))]
b=dt[as.numeric(B)<=0, summary(as.numeric(B))]
c=dt[as.numeric(B)<=0, summary(as.numeric(B))]
results1=data.table(as.list(a),as.list(b),as.list(c))
results1
V1 V2 V3
-208.7355428 -237.0840684 -237.0840684
-109.9255927 -90.91095008 -90.91095008
-64.83885801 -72.52487746 -72.52487746
-70.87867962 -74.7173011 -74.7173011
-25.19085368 -38.76434599 -38.76434599
-1.009403041 -1.733105648 -1.733105648
My dataset has many more columns like A..C, so writing individual summary statements for each variable becomes tedious. Ideally I would like to do this by iterating through each one of those columns. I tried this:
for (i in 1:3) {
#col_name=paste("summary",i,sep="_")
results[i] = dt[as.numeric(dt[[i]])<=0, .(summary(as.numeric(dt[[i]])))]
}
as.data.table(results)
V1 V2 V3
1: -208.7355428 -237.084068 -294.9729
2: -64.0973647 -65.496083 -76.02132
3: 0.1557914 8.544047 14.264934
4: 6.2315259 9.399898 0.5640193
5: 58.6766806 76.905693 77.687728
6: 261.6953128 211.874016 272.82491
It seems that in the for loop, the summary is performed on ALL values (both positive AND negative), and the sub setting is being ignored. What am doing wrong? And is there a way to capture the label of each value as well (i.e the row names should be Min, 1stQU, Median, Mean, 3rdQu, Max). Thank you.