0

I'm comparing mass spec peaks to create a molecular dendrogram in R Studio. I have 88,336 elements which comprise 48.2 MB total memory. I am running this on a desktop with 64 GB RAM and a Intel(R) Core(TM) i9-9900k CPU @ 3.60 GHz.

I am calculating the distances of the peaks in the igraph network, 'net' net.dist <- distances(net) and the computer crashes saying "R session aborted. R encountered a fatal error. The session was terminated."

I don't know enough about computers to remedy this issue. I assume it's because there are so many peaks to calculate, but I also assume the desktop should be able to handle them?

My R Studio session is only at 7.56 GiB during the crashes. The C drive has 513 GB free.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Geomicro
  • 303
  • 3
  • 13
  • 1
    Can't answer the R session aborted question directly. But have you run your code with a small subset of your `net` object to ensure parameter compliance in the `distances` function call? – SteveM Jun 29 '22 at 17:48
  • Hi @SteveM I did run a subset- Sorry for not mentioning. With a small portion of the data, the code works great. – Geomicro Jun 29 '22 at 17:51
  • Do you suppose separating the `net` object into two objects then remerging into a new `igraph` object would work? – Geomicro Jun 29 '22 at 18:00
  • 1
    Not sure how relevant but [this](https://stackoverflow.com/questions/69385651/shortest-path-calculations-crashes-with-igraph-r-maximum-number-of-nodes) may be a similar issue – andrewb Jun 29 '22 at 18:09
  • 1
    I have not used the `igraph` `distance` function so can't answer if an input object to `distances` can first be subsetted and results combined in a post process operation. A simple check of computational feasibility is to compare the size of your net_subset object that works with the size of the object returned by `distances(net_subset)`. Also note that R by default loads objects into working memory when they are created. So your upper limit of accessible memory is 64GB - memory consumed by the operating system and other applications. – SteveM Jun 29 '22 at 19:05
  • Hi all, thank you for the valuable feedback. My current solution is to separate the MS peaks by metadata and creating networks for the different subsets. I believe the core of this issue was that I was exceeding `igraph`'s maximum size. My solution may be unique to my biological system (others may not be able to group their peaks) but it works well. – Geomicro Jun 29 '22 at 19:23

1 Answers1

2

Currently R/igraph can only handle matrices with at most 2^31 - 1 elements, and will fail without warning with more. Future versions will be much more robust, and won't crash. For a graph with n vertices, the distance matrix will have n*n elements. Thus the full distance matrix can't be computed for graphs with more than n = 46340 vertices.

You can, however, compute the distance matrix piece by piece, by setting the v argument of distances to only part of the vertex set.


Note that the limitation of no more than 2^31 - 1 matrix elements comes from R's use of 32-bit integers, even on 64-bit platforms. Also note that storing this many elements already takes up 16 GB of memory. The full distance matrix for 88336 vertices would take up 59 GB of memory, and would take a rather long time to compute.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174