1

I have the following matrix in R, but with 22k entries.:

> unbalanced
      row  col
[1,]    1    3
[2,]    4    5
[3,]    4    6
[4,]    5    6
[5,]    9   10
[6,]    ...

Is there a way to divide this matrix into sets of intersecting data? What I mean is: If there is an intersection between any rows, I want a set with the resulting union of those rows. Ideally in the end I would have a list of non intersecting sets, that contain all the data from my original matrix.

Something like this (based on the above example):

[1,] 1  3
[2,] 4  5  6
[3,] 9  10

I have implemented something similar in python in the past (using iterations), but R is quite a different "beast", and the way perceive it iterations like for loops can and should be avoided.

Thank you in advance for any pointers you may provide.

Update:

Using @A. Webb's answer, using aggregate(col~row,unbalanced,FUN=list) gets me close to what I want, but there is still a missing detail that might not have been evident from the original question. The mentioned solution, provides a list of sets which contain common data (which is what I called overlapping sets in the comments section). To illustrate, further down in the list I get this:

        row   col
...
[160,]  160   c(161, 162, 194, 559, 1195)
[161,]  161   c(162, 194, 559, 1195)
...

What I needed was the union of these two sets, since their intersection is different from ∅ (empty set). I should also add that I do not require the column named "row", so any solution that discards it is OK for me. I only need a list with the sets.

Stunts
  • 430
  • 6
  • 11
  • 1
    The clarified problem sounds like you are trying to identify what are called connected components in graph theory. You have a sparse matrix representation of an adjacency matrix (see `sparseMatrix` from `Matrix`), from which you wish to identify connected components (see `clusters` from 'igraph`, which is compatible with `Matrix`) – A. Webb Mar 03 '16 at 12:21

0 Answers0