Why the inconsistency between colSums and apply for sparse (dgCMatrix) matrices in R?

Question

Summing using Matrix::colSums() gives different results than apply(, MARGIN = 2, FUN = sum) in the code below.

library(Matrix)
# Initialise RNG
set.seed(3737)
# Parameters
nr <- 2000  # Number of rows
nc <- 5  # Number of columns
nz <- round(nr / 10)  # Number of non-zero entries in each column
# Create a sparse matrix with given numbers or rows, columns and non-zero entries
a.sparse <- Matrix(0, nrow = nr, ncol =nc, sparse = TRUE)
for( c. in 1:nc ) {
  a.sparse[sample(nr, nz), c.] <- rnorm(nz)
}
# Column sums are inconsistent between colSums and apply
colSums(a.sparse) - apply(a.sparse, MARGIN = 2, FUN = sum)
# Column sums are consistent between colSums and apply in dense case
a.dense <- as.matrix(a.sparse)
class(a.dense)
colSums(a.dense) - apply(a.dense, MARGIN = 2, FUN = sum)

For example in the colSums(a.sparse) - apply(a.sparse, MARGIN = 2, FUN = sum) line I have as output:

3.907985e-14  2.486900e-14  1.421085e-13 -1.421085e-13  0.000000e+00

but the dense case works correctly. Can anyone shed any light on what is causing the inconsistency here? The errors are orders of magnitude greater than .Machine$double.eps and are causing instability in the software I am analysing.

All your numbers are equal to zero within a margin of error. Don't use `.Machine$double.eps` as your fuzz, that assumes all the lower-level code never propagates rounding errors. Use something closer to 1e-7 or 1e-9 instead. — Hong Ooi, Oct 10 '18 at 16:04
I know the differences are small but they are relevant in my case. These differences are causing instability in some 3rd party software I am analysing. Also looking at it the other way around all numbers are close to zero within *some* margin of error. However big the error is I would still like to understand what is introducing it. — Epimetheus, Oct 10 '18 at 16:27
Go to the David Goldberg article and read up on floating point summation — Hong Ooi, Oct 10 '18 at 16:28
I'm not sure you read the question sufficiently closely. I'm not choosing a level of fuzz. The level of "fuzz" is determined by something outside of my control. I'm interested in why the sums are different in the first place. A general reference to a 94 page manuscript is not very helpful. The stackoverflow you claim answers my question is different, it deals with why a + b != c, not why a + b != a + b. — Epimetheus, Oct 10 '18 at 19:29

Why the inconsistency between colSums and apply for sparse (dgCMatrix) matrices in R?

0 Answers0