0

I have a large matrix bd (25k by 25k) with numeric value, a sample of content which is here

> bd[1:5, 1:5]
          [,1]      [,2]     [,3]     [,4]      [,5]
[1,] 1.8121698 0.0000000 0.000000 0.000000 0.0000000
[2,] 2.0770875 2.0159531 0.000000 0.000000 0.0000000
[3,] 0.3688963 1.6658982 2.299720 0.000000 0.0000000
[4,] 1.5845495 0.4880238 1.538353 1.536267 0.0000000
[5,] 1.1428052 0.7087784 1.656545 1.077034 0.4592339

The variable size is 4.7GB so I want to reduce this into a vector of just the lower triangle values to save space.

However when I do smallbd <- bd[lower.tri(bd, T)] I get Error: cannot allocate vector of size 2.3 Gb

Any suggestions how else I can take the lower triangle within the memory constraint?

Community
  • 1
  • 1
Ricky
  • 4,616
  • 6
  • 42
  • 72
  • This might help: http://stackoverflow.com/questions/20898684/how-to-efficiently-generate-lower-triangle-indices-of-a-symmetric-matrix –  Feb 04 '16 at 07:30
  • How do you create this matrix? What do you intend to do with it? Possibly a sparse matrix (from package Matrix) could be an option? – Roland Feb 04 '16 at 08:47
  • Thanks Pascal will try that. @Roland it's a distance table between 25k points, adapted from answer in http://stackoverflow.com/questions/26958646/calculate-euclidean-distance-matrix-using-a-big-matrix-object/31615523#31615523 . – Ricky Feb 04 '16 at 09:16
  • @Ricky If you use a self-written function why do you store those zeros? You should take a lesson from the `dist` function. – Roland Feb 04 '16 at 09:20
  • I used `dist`, it worked for smaller matrices but failed due to memory limitation when I reached about 25k by 25k. I'm experimenting with the approach from the other answer using `bigmemory`. But we are detracting from the topic of my question actually. I'm interested to know how to get the triangle for big matrix like this regardless of the source; it could have been read from a huge csv. – Ricky Feb 04 '16 at 09:36
  • `lower.tri` actually creates another 25k x 25k matrix. And the assignment creates at least one temporary matrix before the assignment is complete. If you want to save space, then look at the functions in the Matrix package. I'm quite sure it has a triangular type. – IRTFM Feb 04 '16 at 10:12
  • 1
    See: Packed Triangular Dense Matrices - "dtpMatrix" in pkg:Matrix – IRTFM Feb 04 '16 at 10:18
  • I did not say you should use `dist`. Your function should return the same kind of object as `dist` (with the same structure and class). – Roland Feb 04 '16 at 12:30
  • Thanks @42 for the suggestion. I was able to create a lower triangle , let's say `ntp`, of class `ntpMatrix`, but it cannot be used as a subset. I.e. `smallbd <- bd[ntp]` resulted in `Error in bd[ntp] : invalid subscript type 'S4'` . Does it mean this won't work, or did I do something wrongly? – Ricky Feb 15 '16 at 05:31

0 Answers0