Given a data.table as follow
DT <- as.data.table(
cbind(PREC_01N=c(0.0,0.25,2.29,9.77,26.00,0.93,0.00,5.54,9.91,0.00,0.01,0.0),
PREC_01P=c(1.73,0.00,0.01,7.55,0.00,0.11,65.09,13.60,7.09,13.87,5.15,0.87),
PREC_02N=c(0.0,0.26,0.00,9.58,1.50,2.46,0.03,4.94,0.00,1.53,6.11,0.02),
PREC_02P=c(0.33,57.20,10.95,2.89,0.81,2.59,0.00,4.63,11.05,1.53,10.43,1.98),
PREC_03N=c(1.26,0.04,0.00,27.25,0.00,3.87,0.01,0.48,17.73,0.05,12.14,0.02),
PREC_03P=c(0.21,5.74,0.00,1.59,23.35,1.36,0.00,3.75,6.14,0.37,0.00,0.00),
PREC_04N=c(0.00,0.34,1.52,15.20,0.00,3.43,0.07,0.00,0.01,15.12,25.55,0.04),
PREC_04P=c(5.42,9.13,20.64,12.68,35.68,27.05,0.00,0.02,0.00,1.60,0.00,0.67),
PREC_05N=c(0.03,3.56,0.08,9.98,0.01,3.94,0.32,0.00,15.58,0.01,0.00,0.00),
PREC_05P=c(0.21,0.02,57.97,0.01,0.00,4.31,0.00,1.55,13.03,0.07,54.75,0.78),
PREC_06N=c(0.19,4.08,0.10,12.22,0.00,0.72,0.03,0.09,15.19,0.01,9.29,0.18),
PREC_06P=c(0.05,0.59,0.29,6.65,35.56,14.02,0.02,0.38,13.46,0.00,1.07,0.00),
PREC_07N=c(0.42,4.50,11.36,3.34,4.04,0.02,0.03,0.00,1.66,0.00,9.44,0.00),
PREC_07P=c(0.35,10.37,13.12,13.24,8.29,30.73,0.72,0.01,9.74,0.75,5.77,0.00),
PREC_AVN=c(1.26,0.00,16.92,13.09,1.43,6.13,0.00,12.10,8.23,1.00,7.99,0.00)
))
For testing I create 2 columns that are the mean of 15 cols, using 2 different approaches:
DT[,PREC_MEAN:=rowMeans(DT[,1:15,with=F])] # Create column PREC_MEAN - FASTER
DT[,PREC_MEAN2:=apply(DT[,1:15,with=F], 1, mean)] # Create column PREC_MEAN2 - SLOWER
For my surprise, they are different in some lines:
identical(DT$PREC_MEAN, DT$PREC_MEAN2) # FALSE ?????
DTbad <- DT$PREC_MEAN != DT$PREC_MEAN2 # Logical vector
sum(DTbad) # 10 inequalities????
DT <- cbind(ROWID=1:nrow(DT),DT) # Adding a ROWID col to create the IDENTICAL column
DT[,IDENTICAL:=identical(PREC_MEAN, PREC_MEAN2), by=ROWID] # By the way, is there another easier way?
10 of 12 lines showed that their MEAN values are different!
DT[, list(PREC_MEAN, PREC_MEAN2, IDENTICAL)] # What is different?
DT[, list(format(PREC_MEAN, scientific = T),format(PREC_MEAN2, scientific = T), IDENTICAL)] # Trying via scientific notation
DT is a subset of a 572.400 x 66 data.table, the same process above showed the same 10 differences on it that I've reproduced here and added 2 more good cases, the 1st and the last.
Does anyone know what is happening? Why such differences?
Tks in advance.