convert 16.2Gb dist to matrix in R

Question

I have a fairly simple task to do but I can't finish it because of memory problems. So I am wondering if there is a more efficient way to do this. I have a big data.frame that loos like this:

this data frame is called sp_df and I calculate the distance of each point from each other in R. The problem is that because of its big size, I can't melt it to a matrix. And that's where I am blocked.

sp_df <- read.csv("Euclidean_80K_Spots.csv", h=T)
sp_dist <- dist(sp_df)
sp_dist_m <- melt(as.matrix(sp_dt), varnames = c("ID", "neig"))

The dist object is 16.2 Gb, I cannot split the data in little chunks because of its biology. All I could do its filter dist and keep only dist < 2 but I don't know how to do that. Any help will be very appreciated! Thanks !

related: https://stackoverflow.com/questions/59907035/memory-efficient-method-to-create-dist-object-from-distance-matrix — dww, Sep 17 '20 at 13:31
not really because I get this message ```Warning message: 'memory.limit()' is Windows-specific ``` — Amaranta_Remedios, Sep 17 '20 at 13:33
See https://stat.ethz.ch/R-manual/R-patched/library/base/html/Memory-limits.html — Christoph, Sep 17 '20 at 13:35
also https://stackoverflow.com/questions/26958646/calculate-euclidean-distance-matrix-using-a-big-matrix-object/31615523#31615523 — dww, Sep 17 '20 at 13:36
Does this answer your question? [Calculate Euclidean distance matrix using a big.matrix object](https://stackoverflow.com/questions/26958646/calculate-euclidean-distance-matrix-using-a-big-matrix-object) — Christoph, Sep 17 '20 at 13:37
Not working ```Error: Package 'bigmemory' referenced from Rcpp::depends in source file euc_dist.cpp is not available.``` — Amaranta_Remedios, Sep 17 '20 at 13:45
Why do you need to turn it into a matrix and melt? It would be best to avoid that and do the next step (whatever that is) directly with the dist object. — Roland, Sep 17 '20 at 13:49
the next step assuming I had a matrix with three columns (row, col, dist) was to subset for dist < 2. — Amaranta_Remedios, Sep 17 '20 at 13:57
Well, I don't think melting is necessary if your goal is identifying these. Better do the subsetting with the sparse dist object. — Roland, Sep 18 '20 at 06:30
Yes, I think that's a good idea, but I am just lost trying to figure it out how to do that. — Amaranta_Remedios, Oct 02 '20 at 12:25

convert 16.2Gb dist to matrix in R

0 Answers0