For the following typical case:
n = 1000000;
r = randi(n,n,2);
(assume there are 0.05% common numbers between all rows; n
could be even tens of millions)
I am looking for a CPU and Memory efficient solution to merge rows based on any common items (here integer numbers). A list of sample codes in Python is available here and a quick try to translate one into Matlab can be found here.
In my attempt they take ages (minutes to hours), so I am in favor of finding faster solution.
For the above example, the typical output should look like (cell):
{
[1 90 34 67 ... 9]
[35 89]
[45000 23 828 130 8999 45326 ... 11]
...
}
Note also that, I have tried to compile as mex but failed due to no-support for cell in Matlab-Coder.
Edit: A tiny demonstration example
%---------------------------------------
clc
n = 100;
r = randi(n,n,2); % random integers in [1,n], size(n,2)
%---------------------------------------
>> r
r =
82 17 % (1) 82 17
91 13 % (2) 91 13
13 32 % (3) 91 13 32 merged with (2), common 13
82 53 % (4) 82 17 53 merged with (1), common 82
64 17 % (5) 82 17 53 64 merged with (4), common 17
...
94 45
13 31 % (77) 91 13 32 31 merged with (3), common 13
57 51
47 52
2 13 % (80) 91 13 32 31 2 merged with (77), common 13
34 80
%---------------------------------------
c = merge(r); % cpu and memory friendly solution is searched for.
%---------------------------------------
c =
[82 17 53 64]
[91 13 32 31 2]
...