0

Summing using Matrix::colSums() gives different results than apply(, MARGIN = 2, FUN = sum) in the code below.

library(Matrix)
# Initialise RNG
set.seed(3737)
# Parameters
nr <- 2000  # Number of rows
nc <- 5  # Number of columns
nz <- round(nr / 10)  # Number of non-zero entries in each column
# Create a sparse matrix with given numbers or rows, columns and non-zero entries
a.sparse <- Matrix(0, nrow = nr, ncol =nc, sparse = TRUE)
for( c. in 1:nc ) {
  a.sparse[sample(nr, nz), c.] <- rnorm(nz)
}
# Column sums are inconsistent between colSums and apply
colSums(a.sparse) - apply(a.sparse, MARGIN = 2, FUN = sum)
# Column sums are consistent between colSums and apply in dense case
a.dense <- as.matrix(a.sparse)
class(a.dense)
colSums(a.dense) - apply(a.dense, MARGIN = 2, FUN = sum)

For example in the colSums(a.sparse) - apply(a.sparse, MARGIN = 2, FUN = sum) line I have as output:

3.907985e-14  2.486900e-14  1.421085e-13 -1.421085e-13  0.000000e+00

but the dense case works correctly. Can anyone shed any light on what is causing the inconsistency here? The errors are orders of magnitude greater than .Machine$double.eps and are causing instability in the software I am analysing.

Epimetheus
  • 1,119
  • 1
  • 10
  • 19
  • All your numbers are equal to zero within a margin of error. Don't use `.Machine$double.eps` as your fuzz, that assumes all the lower-level code never propagates rounding errors. Use something closer to 1e-7 or 1e-9 instead. – Hong Ooi Oct 10 '18 at 16:04
  • I know the differences are small but they are relevant in my case. These differences are causing instability in some 3rd party software I am analysing. Also looking at it the other way around all numbers are close to zero within *some* margin of error. However big the error is I would still like to understand what is introducing it. – Epimetheus Oct 10 '18 at 16:27
  • 1
    Go to the David Goldberg article and read up on floating point summation – Hong Ooi Oct 10 '18 at 16:28
  • I'm not sure you read the question sufficiently closely. I'm not choosing a level of fuzz. The level of "fuzz" is determined by something outside of my control. I'm interested in why the sums are different in the first place. A general reference to a 94 page manuscript is not very helpful. The stackoverflow you claim answers my question is different, it deals with why a + b != c, not why a + b != a + b. – Epimetheus Oct 10 '18 at 19:29

0 Answers0