0

I have a fairly simple task to do but I can't finish it because of memory problems. So I am wondering if there is a more efficient way to do this. I have a big data.frame that loos like this:

enter image description here

this data frame is called sp_df and I calculate the distance of each point from each other in R. The problem is that because of its big size, I can't melt it to a matrix. And that's where I am blocked.

sp_df <- read.csv("Euclidean_80K_Spots.csv", h=T)
sp_dist <- dist(sp_df)
sp_dist_m <- melt(as.matrix(sp_dt), varnames = c("ID", "neig"))

The dist object is 16.2 Gb, I cannot split the data in little chunks because of its biology. All I could do its filter dist and keep only dist < 2 but I don't know how to do that. Any help will be very appreciated! Thanks !

  • 1
    related: https://stackoverflow.com/questions/59907035/memory-efficient-method-to-create-dist-object-from-distance-matrix – dww Sep 17 '20 at 13:31
  • Does `memory.limit()` help? – Christoph Sep 17 '20 at 13:32
  • not really because I get this message ```Warning message: 'memory.limit()' is Windows-specific ``` – Amaranta_Remedios Sep 17 '20 at 13:33
  • See https://stat.ethz.ch/R-manual/R-patched/library/base/html/Memory-limits.html – Christoph Sep 17 '20 at 13:35
  • 1
    also https://stackoverflow.com/questions/26958646/calculate-euclidean-distance-matrix-using-a-big-matrix-object/31615523#31615523 – dww Sep 17 '20 at 13:36
  • Does this answer your question? [Calculate Euclidean distance matrix using a big.matrix object](https://stackoverflow.com/questions/26958646/calculate-euclidean-distance-matrix-using-a-big-matrix-object) – Christoph Sep 17 '20 at 13:37
  • Not working ```Error: Package 'bigmemory' referenced from Rcpp::depends in source file euc_dist.cpp is not available.``` – Amaranta_Remedios Sep 17 '20 at 13:45
  • Why do you need to turn it into a matrix and melt? It would be best to avoid that and do the next step (whatever that is) directly with the dist object. – Roland Sep 17 '20 at 13:49
  • the next step assuming I had a matrix with three columns (row, col, dist) was to subset for dist < 2. – Amaranta_Remedios Sep 17 '20 at 13:57
  • Well, I don't think melting is necessary if your goal is identifying these. Better do the subsetting with the sparse dist object. – Roland Sep 18 '20 at 06:30
  • Yes, I think that's a good idea, but I am just lost trying to figure it out how to do that. – Amaranta_Remedios Oct 02 '20 at 12:25

0 Answers0