0

My dataset has a number of duration columns that have both positive and negative values, like this:

require(lubridate)
set.seed(43)
dt=data.table(A=as.duration(rnorm(100)*100), B=as.duration(rnorm(100)*100), C=as.duration(rnorm(100)*100))

I need to generate summaries for each column, but ONLY for the negative values in each, like this:

a=dt[as.numeric(A)<=0, summary(as.numeric(A))]
b=dt[as.numeric(B)<=0, summary(as.numeric(B))]
c=dt[as.numeric(B)<=0, summary(as.numeric(B))]
results1=data.table(as.list(a),as.list(b),as.list(c))
results1
     V1               V2            V3 
-208.7355428    -237.0840684    -237.0840684 
-109.9255927    -90.91095008    -90.91095008 
-64.83885801    -72.52487746    -72.52487746 
-70.87867962    -74.7173011     -74.7173011 
-25.19085368    -38.76434599    -38.76434599 
-1.009403041    -1.733105648    -1.733105648

My dataset has many more columns like A..C, so writing individual summary statements for each variable becomes tedious. Ideally I would like to do this by iterating through each one of those columns. I tried this:

for (i in 1:3) {
    #col_name=paste("summary",i,sep="_")
    results[i] = dt[as.numeric(dt[[i]])<=0, .(summary(as.numeric(dt[[i]])))]   
               }
as.data.table(results)
         V1              V2        V3 
1:  -208.7355428    -237.084068 -294.9729 
2:  -64.0973647     -65.496083  -76.02132 
3:  0.1557914       8.544047    14.264934 
4:  6.2315259       9.399898    0.5640193 
5:  58.6766806      76.905693   77.687728 
6:  261.6953128     211.874016  272.82491

It seems that in the for loop, the summary is performed on ALL values (both positive AND negative), and the sub setting is being ignored. What am doing wrong? And is there a way to capture the label of each value as well (i.e the row names should be Min, 1stQU, Median, Mean, 3rdQu, Max). Thank you.

maximus
  • 154
  • 8
  • How about the solution posted [here](https://stackoverflow.com/questions/50710266/r-data-table-sum-all-columns-whose-names-are-stored-in-a-vector/50710336#50710336)? You can easily add a condition to the syntax given there. – Michael Ohlrogge Jun 11 '18 at 00:08
  • 4
    lapply(.SD, function(x) summary (x[x<0])) – MichaelChirico Jun 11 '18 at 00:16
  • Thank you MichaelChirico , was not aware of (x[x<0]) syntax. Unfortunately i don't see a way to "accept" your answer. – maximus Jun 11 '18 at 01:14
  • Marius: To make sure someone sees your message you can use the @ symbol like “@MichaelChirico please see OP’s request for upgrading from comment to answer above” – Frank Jun 11 '18 at 05:36

0 Answers0