0

I want to perform Hierarchical Clustering in this dataset (107721 rows and 16 columns). In order to do this I have to calculate the distance matrix. When I use the dist function I get the error:

Error: cannot allocate vector of size 43.2 Gb

I have searched a lot and I haven't found a solution yet. Also, I saw that errors like this can be handled with bigmemory package. I read the documentation but I didn't understand how to use the functions of the package in order to solve this problem.

Does anyone know how to solve this problem?

Thanks in advance!

Billy
  • 27
  • 7
  • Your data is too big to compute the distance metric, either use some approximations, if they exist, or reduce your data size. – user2974951 Sep 06 '22 at 10:13
  • you can try to use the [disk frame package](https://diskframe.com/). It is much slower than doing your computations in RAM. It also may be hard on your SSD but if those limitations are not an issue it does provide a lot of utility. – D.J Sep 06 '22 at 11:40
  • @user2974951 I saw [this](https://stackoverflow.com/questions/40989003/hclust-in-r-on-large-datasets) and I installed the Rclusterpp package and I used Rclusterpp.hclust in order to do the Hierarchical Clustering – Billy Sep 06 '22 at 20:09

0 Answers0