2

I have an issue where for a fixed dataset, simple vector or matrix multiplications in R sometimes give me huge results, indicating some sort of numeric instability. Oddly, however, this happens only occasionally and for identical operations performed on identical inputs. I find this very puzzling!

Here is a minimum viable example:

set.seed(777)
count_elements <- 2000000
runs <- 100
res <- rep(0, runs)
for (i in 1:runs) {
  x <- rnorm(count_elements)
  y <- rnorm(count_elements)
  res[i] <- t(x) %*% y
}
res

I have executed the above block multiple times (in its entirety, including the set seed command at the beginning). Mostly, I get the exact same results - as expected:

  [1]  -897.68389   834.06812  -882.26393  -926.86321  2012.72503  -356.15314  -359.85574   451.91216  1277.70927  1363.24631 -1644.97030 -2419.13855
 [13] -2172.79840 -1326.87133 -1511.33608  1348.51443  2665.13645  -762.42060  2993.01180   697.97725  1382.22834 -1203.22142 -1742.17132  -161.99406
 [25]   296.25066   326.62533  1209.45284  -762.32908   279.24635   493.08010 -1077.79078 -2505.92488  1960.47937  2129.43811   207.21175  2029.61236
 [37]  1770.29784 -3019.54165 -2713.60022   714.01328  1213.01295  -211.48069  -410.20189   -70.66189  1594.70185  2080.04606  -912.31666 -1638.39288
 [49]  1522.47634  3205.78793  1013.93541 -1991.66930   105.89708  1208.73446   168.29954  1185.06517  -480.73878  -132.51146 -1054.01127   949.65380
 [61]  -676.56834  1571.62409  -415.25738   701.38207 -2263.40872  2481.35186   306.89755    45.67761  1369.27758   343.21505 -1568.65000 -2354.67460
 [73]  1359.26185   407.53511   458.08214   311.50405  1578.12875  -437.51657  1500.52921 -2012.22430  -739.64431  -221.48344   319.24941 -2181.36662
 [85] -1800.50088  -540.30153   395.05350 -1170.72800  -190.94960 -3134.81671   800.47275   388.55436 -3052.48226   -31.90670  1415.19803  -721.63404
 [97] -1284.55373 -1700.18037  -775.71142   739.95812

However, on some executions one or more of the values get replaced with a very large value. For example, I have also seen the following output. The only difference is that run 42 (i.e. element 42 in the vector res) has resulted in the huge value 1.675104e+308 (instead of the correct value of -211.48069):

  [1]  -8.976839e+02   8.340681e+02  -8.822639e+02  -9.268632e+02   2.012725e+03  -3.561531e+02  -3.598557e+02   4.519122e+02   1.277709e+03   1.363246e+03
 [11]  -1.644970e+03  -2.419139e+03  -2.172798e+03  -1.326871e+03  -1.511336e+03   1.348514e+03   2.665136e+03  -7.624206e+02   2.993012e+03   6.979772e+02
 [21]   1.382228e+03  -1.203221e+03  -1.742171e+03  -1.619941e+02   2.962507e+02   3.266253e+02   1.209453e+03  -7.623291e+02   2.792463e+02   4.930801e+02
 [31]  -1.077791e+03  -2.505925e+03   1.960479e+03   2.129438e+03   2.072117e+02   2.029612e+03   1.770298e+03  -3.019542e+03  -2.713600e+03   7.140133e+02
 [41]   1.213013e+03  1.675104e+308  -4.102019e+02  -7.066189e+01   1.594702e+03   2.080046e+03  -9.123167e+02  -1.638393e+03   1.522476e+03   3.205788e+03
 [51]   1.013935e+03  -1.991669e+03   1.058971e+02   1.208734e+03   1.682995e+02   1.185065e+03  -4.807388e+02  -1.325115e+02  -1.054011e+03   9.496538e+02
 [61]  -6.765683e+02   1.571624e+03  -4.152574e+02   7.013821e+02  -2.263409e+03   2.481352e+03   3.068976e+02   4.567761e+01   1.369278e+03   3.432150e+02
 [71]  -1.568650e+03  -2.354675e+03   1.359262e+03   4.075351e+02   4.580821e+02   3.115040e+02   1.578129e+03  -4.375166e+02   1.500529e+03  -2.012224e+03
 [81]  -7.396443e+02  -2.214834e+02   3.192494e+02  -2.181367e+03  -1.800501e+03  -5.403015e+02   3.950535e+02  -1.170728e+03  -1.909496e+02  -3.134817e+03
 [91]   8.004728e+02   3.885544e+02  -3.052482e+03  -3.190670e+01   1.415198e+03  -7.216340e+02  -1.284554e+03  -1.700180e+03  -7.757114e+02   7.399581e+02

The issue appears to be related to the underlying floating point operations, or at least something related to R's underlying linear algebra libraries. It is not due to the random number generation, since I see the same issue when performing linear algebra operations on a fixed proprietary dataset (that I can't share) without drawing any random variables. But I don't know how to investigate this further - so any pointers would be appreciated!

I am currently running R 3.5.3 on RStudio 1.2.1335 (on MacOS Sierra 10.12.6) but I also managed to reproduce this on a later version of R on the same computer!

Here's the result of running sessionInfo():

R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SGL_1.3            glmnetUtils_1.1.4  gplots_3.0.1.1     RColorBrewer_1.1-2 dplyr_0.8.3        lattice_0.20-38    Rfast_1.9.8       
 [8] RcppZiggurat_0.1.5 Rcpp_1.0.3         pracma_2.2.9      

loaded via a namespace (and not attached):
 [1] rstudioapi_0.10    magrittr_1.5       tidyselect_0.2.5   R6_2.4.1           rlang_0.4.2        foreach_1.4.7      stringr_1.4.0     
 [8] caTools_1.17.1.3   tools_3.5.3        grid_3.5.3         glmnet_2.0-18      KernSmooth_2.23-15 iterators_1.0.12   gtools_3.8.1      
[15] assertthat_0.2.1   tibble_2.1.3       crayon_1.3.4       Matrix_1.2-15      purrr_0.3.3        codetools_0.2-16   bitops_1.0-6      
[22] glue_1.3.1         stringi_1.4.3      gdata_2.18.0       compiler_3.5.3     pillar_1.4.3       pkgconfig_2.0.3   
A. G.
  • 159
  • 11
  • 2
    I cannot reproduce the error. Can you make it more clear if you run the same code from the start, i.e., resetting the RNG seed to `777` and then get the error described in the question? – Rui Barradas Jan 05 '20 at 21:19
  • Correct, I re-execute the entire code block as I copy-pasted it, including the set seed command. And it gives a consistent and correct result most of the time, but sometimes I get very huge values. I believe the 4th time that I executed that entire code block is when I saw the first huge value. – A. G. Jan 05 '20 at 21:21
  • Try to look here. Maybe it will solve your issue: https://stackoverflow.com/questions/20624698/fixing-set-seed-for-an-entire-session – DJV Jan 05 '20 at 21:57
  • Thanks for the suggestion, but this isn't the fault of the random number generation: I _want_ the same draws in order to have a fixed dataset to demonstrate the issue. I have a similar issue on a propriety dataset without any random numbers whatsoever. I think that suggestions to do with numerical precision and even R/RStudio installations would be very helpful? – A. G. Jan 05 '20 at 22:01
  • Wanting the same draws is the same idea of *fixing* the random number generator with set.seed. – DJV Jan 05 '20 at 22:10
  • And I do have the same draws. This is probably the fault of my writing, but I think you're missing the point of my question? – A. G. Jan 05 '20 at 22:15
  • 1
    I ran the code 30 times and it always returns the same `res[42]`, the correct one `-211.48069` so I really cannot reproduce this. R 3.6.2 on Ubuntu 19.10. – Rui Barradas Jan 05 '20 at 23:46
  • Nitpick: if `x` and `y` are vectors, `t(x) %*% y` and `x %*% y` give the same result. To see it, try, say, `t(1:4) %*% 1:4` and `1:4 %*% 1:4`. – Rui Barradas Jan 05 '20 at 23:59
  • Sounds a bit like an intermittent hardware fault with your RAM. Might be worth checking it. – Andrew Gustar Jan 19 '20 at 22:28
  • @AndrewGustar Interesting... anything you can recommend to follow up on this hypothesis? Thanks – A. G. Jan 20 '20 at 07:12
  • PS. the bounty expires today, so can you write this up as an answer with detail please? – A. G. Jan 20 '20 at 07:17

2 Answers2

1

The fact that this is intermittent and cannot be replicated leads me to think that this could be a hardware fault on your machine.

I had a similar issue on a windows PC a few years back, with intermittent errors in one particular piece of software, plus the occasional more serious system crash. I finally tracked it down to a fault with one of the DIMM memory modules. Replacing the RAM memory solved the problem.

Unfortunately it can be hard to pin these errors down precisely. There are several free memory testing packages available which run through various memory read/write tests to check for errors - your best bet is to search online and see which one looks suitable. I can't remember which one I used - it was several years ago and was for a PC not a Mac.

Best of luck!

Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • This is an interesting lead, and the only one I have - bounty awarded! I'll investigate this as a hardware issue now. – A. G. Jan 20 '20 at 16:49
0

I also couldn't replicate this issue but maybe you can easily prove it with this approach use reprex::reprex(venue = 'so',si = TRUE)

results_vector <- numeric(2)
for (j in 1:10) {
  set.seed(777)
  count_elements <- 2000000
  runs <- 100
  res <- rep(0, runs)
  for (i in 1:runs) {
    x <- rnorm(count_elements)
    y <- rnorm(count_elements)
    res[i] <- t(x) %*% y
  }

  results_vector[j] <- mean(res)
}

results_vector
#>  [1] -34.40672 -34.40672 -34.40672 -34.40672 -34.40672 -34.40672 -34.40672
#>  [8] -34.40672 -34.40672 -34.40672

Created on 2020-01-05 by the reprex package (v0.3.0)

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: elementary OS 5.1 Hera
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0
#>  [5] yaml_2.2.0      Rcpp_1.0.3      stringi_1.4.3   rmarkdown_2.0  
#>  [9] highr_0.8       knitr_1.26      stringr_1.4.0   xfun_0.11      
#> [13] digest_0.6.23   rlang_0.4.2     evaluate_0.14
Bruno
  • 4,109
  • 1
  • 9
  • 27
  • Shouldn't it be `results_vector <- numeric(10)`? With `10`, not `2`? – Rui Barradas Jan 05 '20 at 23:01
  • Yes they shold, not like it is going to matter in this case but still right – Bruno Jan 05 '20 at 23:53
  • This is what I get when I run your code block: `[1] -3.440672e+01 1.375315e+306 -3.440672e+01 -3.440672e+01 -3.440672e+01 -3.440672e+01 -3.440672e+01 -3.440672e+01 -3.440672e+01 [10] -3.440672e+01`. So I still have the issue! – A. G. Jan 13 '20 at 19:48
  • Also, I edited my original question above to include the result of running `sessionInfo()` like you asked for. Thanks in advance! – A. G. Jan 13 '20 at 19:49
  • To guarantee the same inputs. It's intentional: I don't want random values, I want the same values. (It's the point of my question!) – A. G. Jan 13 '20 at 20:18