1

I have a distance/dissimilarity matrix (30K rows 30K columns) that is calculated in a loop and stored in ROM.

I would like to do clustering over the matrix. I import and cluster it as below:

Mydata<-read.csv("Mydata.csv")
Mydata<-as.dist(Mydata)
Results<-hclust(Mydata)

But when I convert the matrix to dist object, I get RAM limitation error. How can I handle it? Can I run hclust algorithm in a loop/chunking? I mean I divide the distance matrix into chunks and run them in a loop?

Amessihel
  • 5,891
  • 3
  • 16
  • 40
A. Bek
  • 21
  • 1

1 Answers1

0

You may try the following:

Mydata<-read.csv("Mydata.csv")
Mydata<-as.matrix(Mydata)
Mydata<-as.dist(Mydata)
Results<-hclust(Mydata)

Read the following to track what's happening in your session: http://adv-r.had.co.nz/memory.html

This might be helpful in general: https://cran.r-project.org/web/packages/fastcluster/ And also this question: hclust() in R on large datasets

It also depends on your OS, but maybe you can change the RAM limit (or just run this code on someone else's computer with more RAM, store the object using saveRDS and then read it in your own computer using readRDS).

Tal Galili
  • 24,605
  • 44
  • 129
  • 187