Difference between rowMeans() and apply(.., mean) on data.table

Question

Given a data.table as follow

DT <- as.data.table(
  cbind(PREC_01N=c(0.0,0.25,2.29,9.77,26.00,0.93,0.00,5.54,9.91,0.00,0.01,0.0), 
        PREC_01P=c(1.73,0.00,0.01,7.55,0.00,0.11,65.09,13.60,7.09,13.87,5.15,0.87),
        PREC_02N=c(0.0,0.26,0.00,9.58,1.50,2.46,0.03,4.94,0.00,1.53,6.11,0.02),
        PREC_02P=c(0.33,57.20,10.95,2.89,0.81,2.59,0.00,4.63,11.05,1.53,10.43,1.98),
        PREC_03N=c(1.26,0.04,0.00,27.25,0.00,3.87,0.01,0.48,17.73,0.05,12.14,0.02),
        PREC_03P=c(0.21,5.74,0.00,1.59,23.35,1.36,0.00,3.75,6.14,0.37,0.00,0.00),
        PREC_04N=c(0.00,0.34,1.52,15.20,0.00,3.43,0.07,0.00,0.01,15.12,25.55,0.04),
        PREC_04P=c(5.42,9.13,20.64,12.68,35.68,27.05,0.00,0.02,0.00,1.60,0.00,0.67),
        PREC_05N=c(0.03,3.56,0.08,9.98,0.01,3.94,0.32,0.00,15.58,0.01,0.00,0.00),
        PREC_05P=c(0.21,0.02,57.97,0.01,0.00,4.31,0.00,1.55,13.03,0.07,54.75,0.78),
        PREC_06N=c(0.19,4.08,0.10,12.22,0.00,0.72,0.03,0.09,15.19,0.01,9.29,0.18),
        PREC_06P=c(0.05,0.59,0.29,6.65,35.56,14.02,0.02,0.38,13.46,0.00,1.07,0.00),
        PREC_07N=c(0.42,4.50,11.36,3.34,4.04,0.02,0.03,0.00,1.66,0.00,9.44,0.00),
        PREC_07P=c(0.35,10.37,13.12,13.24,8.29,30.73,0.72,0.01,9.74,0.75,5.77,0.00),
        PREC_AVN=c(1.26,0.00,16.92,13.09,1.43,6.13,0.00,12.10,8.23,1.00,7.99,0.00)
  ))

For testing I create 2 columns that are the mean of 15 cols, using 2 different approaches:

DT[,PREC_MEAN:=rowMeans(DT[,1:15,with=F])]         # Create column PREC_MEAN - FASTER
DT[,PREC_MEAN2:=apply(DT[,1:15,with=F], 1, mean)]  # Create column PREC_MEAN2 - SLOWER

For my surprise, they are different in some lines:

identical(DT$PREC_MEAN, DT$PREC_MEAN2)             # FALSE ?????
DTbad <- DT$PREC_MEAN != DT$PREC_MEAN2             # Logical vector 
sum(DTbad)                                         # 10 inequalities????
DT <- cbind(ROWID=1:nrow(DT),DT)                   # Adding a ROWID col to create the IDENTICAL column
DT[,IDENTICAL:=identical(PREC_MEAN, PREC_MEAN2), by=ROWID]  # By the way, is there another easier way?

10 of 12 lines showed that their MEAN values are different!

DT[, list(PREC_MEAN, PREC_MEAN2, IDENTICAL)]          # What is different?
DT[, list(format(PREC_MEAN, scientific = T),format(PREC_MEAN2, scientific = T), IDENTICAL)]  # Trying via scientific notation

DT is a subset of a 572.400 x 66 data.table, the same process above showed the same 10 differences on it that I've reproduced here and added 2 more good cases, the 1st and the last.

Does anyone know what is happening? Why such differences?

Tks in advance.

Yes it is. Read the R-FAQ for this and many other issues to remember. — IRTFM, Aug 19 '14 at 18:33

Difference between rowMeans() and apply(.., mean) on data.table

0 Answers0