0

I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.

I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.

The original code was something along the lines of

ans <- apply(object, 1, function(x) sum((x - a) / b))

and my code is

ans = Rfast:::rowsums((object-a)/b)

I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?

Edit

Trying to reproduce the error is pretty hard...

I have been able to isolate the discrepancy to when I divide by my vector b with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]. If I take this term out the two methods give the same answer.

I then tried two methods to improve my answer, one was dividing b by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.

I also tried forcing my b vector to be integers, which also didn't work.

Sample data doesn't reproduce my error either, which is frustrating...

objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))

gives

ans[1] = 0.5418828413

rowsums((objmat[1,]-amat)/bmat) = 0.5418828413

I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply or rowsums) would be more accurate!

student
  • 1,001
  • 2
  • 12
  • 24
  • What is `a` and `b`. It may depend on the length of 'a' and 'b', In the `apply` you are looping over rows and I am assuming that a and b length will be the length of number of columns of the object. – akrun Oct 21 '21 at 17:35
  • 1
    Maybe none of the methods is throwing away precision or making rounding errors: `(1/5 + 3/5) - 4/5 == 1/5 + (3/5 - 4/5)`. This might be R [FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) or [Why are these numbers not equal?](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal). – Rui Barradas Oct 21 '21 at 18:10
  • Can you provide a reproducible example? We don't need your (large) `object`, a small one would work just fine. I'm inferring the differences are sub-decimal, in which case one might use `runif` or another random function to generate it; if so, please remember to use `set.seed` so that we can see the same numbers/results. Thanks. – r2evans Oct 21 '21 at 18:24
  • This post contains inadequate information to diagnose the issue beyond speculation that it is due to the normal rounding that occurs in floating-point operations. It does not even show an example of two different answers, let alone the inputs and actual code that produce them. – Eric Postpischil Oct 23 '21 at 10:40
  • I get that! I'm working with some loaded data, but when I try to using dummy data to reproduce that error I can't, so speculation is all that I was hoping for! – Ned Booker Oct 25 '21 at 22:30

0 Answers0