I have an issue where for a fixed dataset, simple vector or matrix multiplications in R sometimes give me huge results, indicating some sort of numeric instability. Oddly, however, this happens only occasionally and for identical operations performed on identical inputs. I find this very puzzling!
Here is a minimum viable example:
set.seed(777)
count_elements <- 2000000
runs <- 100
res <- rep(0, runs)
for (i in 1:runs) {
x <- rnorm(count_elements)
y <- rnorm(count_elements)
res[i] <- t(x) %*% y
}
res
I have executed the above block multiple times (in its entirety, including the set seed command at the beginning). Mostly, I get the exact same results - as expected:
[1] -897.68389 834.06812 -882.26393 -926.86321 2012.72503 -356.15314 -359.85574 451.91216 1277.70927 1363.24631 -1644.97030 -2419.13855
[13] -2172.79840 -1326.87133 -1511.33608 1348.51443 2665.13645 -762.42060 2993.01180 697.97725 1382.22834 -1203.22142 -1742.17132 -161.99406
[25] 296.25066 326.62533 1209.45284 -762.32908 279.24635 493.08010 -1077.79078 -2505.92488 1960.47937 2129.43811 207.21175 2029.61236
[37] 1770.29784 -3019.54165 -2713.60022 714.01328 1213.01295 -211.48069 -410.20189 -70.66189 1594.70185 2080.04606 -912.31666 -1638.39288
[49] 1522.47634 3205.78793 1013.93541 -1991.66930 105.89708 1208.73446 168.29954 1185.06517 -480.73878 -132.51146 -1054.01127 949.65380
[61] -676.56834 1571.62409 -415.25738 701.38207 -2263.40872 2481.35186 306.89755 45.67761 1369.27758 343.21505 -1568.65000 -2354.67460
[73] 1359.26185 407.53511 458.08214 311.50405 1578.12875 -437.51657 1500.52921 -2012.22430 -739.64431 -221.48344 319.24941 -2181.36662
[85] -1800.50088 -540.30153 395.05350 -1170.72800 -190.94960 -3134.81671 800.47275 388.55436 -3052.48226 -31.90670 1415.19803 -721.63404
[97] -1284.55373 -1700.18037 -775.71142 739.95812
However, on some executions one or more of the values get replaced with a very large value. For example, I have also seen the following output. The only difference is that run 42 (i.e. element 42 in the vector res
) has resulted in the huge value 1.675104e+308
(instead of the correct value of -211.48069
):
[1] -8.976839e+02 8.340681e+02 -8.822639e+02 -9.268632e+02 2.012725e+03 -3.561531e+02 -3.598557e+02 4.519122e+02 1.277709e+03 1.363246e+03
[11] -1.644970e+03 -2.419139e+03 -2.172798e+03 -1.326871e+03 -1.511336e+03 1.348514e+03 2.665136e+03 -7.624206e+02 2.993012e+03 6.979772e+02
[21] 1.382228e+03 -1.203221e+03 -1.742171e+03 -1.619941e+02 2.962507e+02 3.266253e+02 1.209453e+03 -7.623291e+02 2.792463e+02 4.930801e+02
[31] -1.077791e+03 -2.505925e+03 1.960479e+03 2.129438e+03 2.072117e+02 2.029612e+03 1.770298e+03 -3.019542e+03 -2.713600e+03 7.140133e+02
[41] 1.213013e+03 1.675104e+308 -4.102019e+02 -7.066189e+01 1.594702e+03 2.080046e+03 -9.123167e+02 -1.638393e+03 1.522476e+03 3.205788e+03
[51] 1.013935e+03 -1.991669e+03 1.058971e+02 1.208734e+03 1.682995e+02 1.185065e+03 -4.807388e+02 -1.325115e+02 -1.054011e+03 9.496538e+02
[61] -6.765683e+02 1.571624e+03 -4.152574e+02 7.013821e+02 -2.263409e+03 2.481352e+03 3.068976e+02 4.567761e+01 1.369278e+03 3.432150e+02
[71] -1.568650e+03 -2.354675e+03 1.359262e+03 4.075351e+02 4.580821e+02 3.115040e+02 1.578129e+03 -4.375166e+02 1.500529e+03 -2.012224e+03
[81] -7.396443e+02 -2.214834e+02 3.192494e+02 -2.181367e+03 -1.800501e+03 -5.403015e+02 3.950535e+02 -1.170728e+03 -1.909496e+02 -3.134817e+03
[91] 8.004728e+02 3.885544e+02 -3.052482e+03 -3.190670e+01 1.415198e+03 -7.216340e+02 -1.284554e+03 -1.700180e+03 -7.757114e+02 7.399581e+02
The issue appears to be related to the underlying floating point operations, or at least something related to R's underlying linear algebra libraries. It is not due to the random number generation, since I see the same issue when performing linear algebra operations on a fixed proprietary dataset (that I can't share) without drawing any random variables. But I don't know how to investigate this further - so any pointers would be appreciated!
I am currently running R 3.5.3 on RStudio 1.2.1335 (on MacOS Sierra 10.12.6) but I also managed to reproduce this on a later version of R on the same computer!
Here's the result of running sessionInfo()
:
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] SGL_1.3 glmnetUtils_1.1.4 gplots_3.0.1.1 RColorBrewer_1.1-2 dplyr_0.8.3 lattice_0.20-38 Rfast_1.9.8
[8] RcppZiggurat_0.1.5 Rcpp_1.0.3 pracma_2.2.9
loaded via a namespace (and not attached):
[1] rstudioapi_0.10 magrittr_1.5 tidyselect_0.2.5 R6_2.4.1 rlang_0.4.2 foreach_1.4.7 stringr_1.4.0
[8] caTools_1.17.1.3 tools_3.5.3 grid_3.5.3 glmnet_2.0-18 KernSmooth_2.23-15 iterators_1.0.12 gtools_3.8.1
[15] assertthat_0.2.1 tibble_2.1.3 crayon_1.3.4 Matrix_1.2-15 purrr_0.3.3 codetools_0.2-16 bitops_1.0-6
[22] glue_1.3.1 stringi_1.4.3 gdata_2.18.0 compiler_3.5.3 pillar_1.4.3 pkgconfig_2.0.3