I am running R 3.2.3 on a machine with 128 GB of RAM. I have a large matrix of 123028 rows x 168 columns. I would like to use a hierarchical clustering algorithm in R, so before I do that, I am trying to create a distance matrix in R using the vegdist() function in the vegan package with the method Bray-Curtis. I get an error about memory allocation:
df <- as.data.frame(matrix(rnorm(20668704), nrow = 123028))
library(vegan)
mydist <- vegdist(df)
Error in vegdist(df) : long vectors (argument 4) are not supported in .Fortran
If I use the pryr package to find out how much memory is needed for the distance matrix, I see that 121 GB are needed, which is less than the RAM that I have.
library(pryr)
mem_change(x <- 1:123028^2)
121 GB
I know there used to be a limit of 2 billion values for a single object in R, but I thought that limit disappeared in recent versions of R. Is there another memory limit I'm not aware of?
The bottom line is that I am wondering: What can I do about this error? Is it really because of memory limits or am I wrong about that? I would like to stay in R and use a clustering algorithm besides k-means, so I need to calculate a distance matrix.