0

I have an adjacency sparse matrix M of size 12 000 X 12 000 in R and I would like to transfer it to another software. I am constrained to convert it to a 3 columns data.frame with col1 being the name of the col of my matrix, col2 the name of the row of my matrix and col3 the value M[i,j]. I only want to create an entry in the data.frame if M[i,j] is not 0 (keeping the logic of the sparse matrix).

I have seen a lot of questions asking how to do the opposite action, so I guess it is not that complicated but I can't find how to do this efficiently.

Thanks for your help

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187

2 Answers2

3

First, I'm going to assume that you have a regular sparse matrix, as created via the Matrix package. That is, the nonzero entries are encoded in terms of their values, columns, and row offsets.

The Matrix package has an alternate representation of a sparse matrix as a set of triplets, where the nonzero values are encoded in terms of their coordinates. This is basically what you want. Converting to this form is easy, as it turns out; and then you can turn it into a data frame.

One wart is that the coordinates are zero-based (ie, elements in the first row are encoded as row 0), which you may or may not want to convert to one-based.

library(Matrix)
# some sample data
m <- rsparsematrix(12000, 12000, 1e-7)

# convert to triplet form
mm <- as(m, "dgTMatrix")

# convert to data frame: convert to 1-based indexing
data.frame(i=mm@i + 1, j=mm@j + 1, x=mm@x)

#       i     j     x
#1    144   624  0.16
#2   3898  1106 -1.80
#3  11444  1395  0.89
#4   3981  2300  0.27
#5   3772  3602 -0.42
#6   2674  4058  0.79
#7   4446  4943  0.58
#8   4550  6629  0.82
#9   4125  6867 -0.86
#10  3151  7865 -0.42
#11 11590  8019 -0.96
#12  4808  9428 -1.30
#13 10453 11141  0.39
#14 11112 11592 -1.40

If you want the row/column names as opposed to numbers:

data.frame(i=rownames(mm)[mm@i + 1], j=colnames(mm)[mm@j + 1], x=mm@x)
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
-1

Under the hood, a matrix is just a vector. You could use which to get the vector-indices of the nonzero items and then do some modular arithmetic to reconstruct the indices:

set.seed(123)
M <- matrix(sample(0:2,12,replace = TRUE,prob = c(0.8,0.1,0.1)),nrow = 3)
v <- which(M != 0)
rows <- 1 + (v-1) %% nrow(M)
cols <- 1 + (v-1) %/% nrow(M)
nonzeros <- data.frame(i=rows,j=cols,item=M[v])

In this example:

> M
     [,1] [,2] [,3] [,4]
[1,]    0    2    0    0
[2,]    0    1    2    1
[3,]    0    0    0    0
> nonzeros
  i j item
1 1 2    2
2 2 2    1
3 2 3    2
4 2 4    1
John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • OP has a _sparse_ matrix, not a dense one – Hong Ooi Oct 05 '18 at 11:04
  • @HongOoi OP's comment about `melt` not being able to work with a large matrix suggested to me at least that he was talking a literal matrix and was using "sparse" in a mathematical sense, rather than talking about a sparse matrix object as defined in some unspecified package. – John Coleman Oct 05 '18 at 11:07
  • They have a 12k x 12k matrix. That is almost certainly going to be sparse – Hong Ooi Oct 05 '18 at 11:08
  • @HongOoi Hard to know for sure until OP clarifies. In any event, your answer nicely covers that sparse matrix object case. – John Coleman Oct 05 '18 at 11:12
  • OP here: it is indeed a sparse matrix in the sense of R, not the mathematical sense. – Tochoka Oct 05 '18 at 12:14
  • @Tochoka Okay. 1) When posting an R question, please specify any packages that you are using. 2) You should consider marking Hong Ooi's answer as "accepted". 3) I will leave my answer up in the off-hand chance that it might help some future reader who is searching for a way to convert the non-zero entries in a regular R matrix into a dataframe. – John Coleman Oct 05 '18 at 13:57