I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.
I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast
and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.
The original code was something along the lines of
ans <- apply(object, 1, function(x) sum((x - a) / b))
and my code is
ans = Rfast:::rowsums((object-a)/b)
I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?
Edit
Trying to reproduce the error is pretty hard...
I have been able to isolate the discrepancy to when I divide by my vector b
with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]
. If I take this term out the two methods give the same answer.
I then tried two methods to improve my answer, one was dividing b
by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.
I also tried forcing my b
vector to be integers, which also didn't work.
Sample data doesn't reproduce my error either, which is frustrating...
objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))
gives
ans[1] = 0.5418828413
rowsums((objmat[1,]-amat)/bmat) = 0.5418828413
I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply
or rowsums
) would be more accurate!