1

I have n different length cell vectors, call it c{i}, i=1,2,...,n.

I wanna know if there any c{j} is the subset of c{i}, for example:

c{1}=[1 2 3 4 5 6]; c{2}=[1 3 5 7];c{3}=[2 4 6 8];
c{4}=[1 4 6];c{5}=[3 7];

then I hope I can find that c{4} is subset of c{1}, c{5} is subset of c{2}.

I used two for loops with intersect function can process it, but I hope I can use at most one loop to process it, is there any way can achieve it?

Yevis
  • 63
  • 5

3 Answers3

1

Here's a option using nchoosek – like cellfun it's also a loop in disguise of course:

c{1} = [1 2 3 4 5 6];
c{2} = [1 3 5 7];
c{3} = [2 4 6 8];
c{4} = [1 4 6];
c{5} = [3 7];
combs = nchoosek(1:numel(c),2);
subC = cell(size(combs,1),1);
for i = 1:size(combs,1)
    subC{i} = intersect(c{combs(i,:)});
end

which results in the cell array

subC = 

    [1x3 double]
    [1x3 double]
    [1x3 double]
    [         3]
    [1x0 double]
    [         1]
    [1x2 double]
    [1x2 double]
    [1x0 double]
    [1x0 double]

Each cell in subC corresponds to intersection of the cells indices in combs (a matrix form could easily be built within the loop if that is preferred).

EDIT: If you simply want to know if one vector is a subset of another then you can either use subC and combs from above to determine this or calculate it directly

combs = nchoosek(1:numel(c),2);
isSubC = logical(eye(numel(c)));
for i = 1:size(combs,1)
    subC = intersect(c{combs(i,:)});
    isSubC(combs(i,1),combs(i,2)) = isequal(subC,c{combs(i,2)});
    isSubC(combs(i,2),combs(i,1)) = isequal(subC,c{combs(i,1)});
end

where isSubC(i,j) specifies if c{j} is a subset of c{i}.

horchler
  • 18,384
  • 4
  • 37
  • 73
  • thank you horchler, it's a nice work, but how can I check c{j} is the subset of c{i} from the subC? – Yevis Aug 17 '13 at 23:23
  • See my edit. You did ask for the "... intersection of many different length cell vectors," which is what the first case gives. As I indicated, `combs` and `subC` can be used to to find subsets. One way is with `isequal(subC{find(combs(:,1)==i&combs(:,2)==j)},c{j})` where `j` must be > `i`. – horchler Aug 18 '13 at 01:12
  • Hi, horchler, May I count the number of subset of c{i} with this loop? – Yevis Aug 18 '13 at 02:07
  • Of course. Have you tried it? You can do it in the loop or you can do it after the fact with `nnz(isSubC)` (note that I have made sets be subsets of themselves so you may need to subtract `numel(c)` from the total if you don't want to include them). In the loop you can just create a counter variable and add `isequal(subC,c{combs(i,2)})` and `isequal(subC,c{combs(i,1)})` to it on each iteration. – horchler Aug 18 '13 at 02:18
  • Hi horchler, when I test my vectors with your code, I found a problem, – Yevis Aug 18 '13 at 02:50
  • some of c{i} is empty, so these empty vectors are the subsets of most c{i}, I try to fix it, but failed, can you fix it for me? – Yevis Aug 18 '13 at 02:52
  • Have you run the empty set data using my answer? `isSubC` won't match `[]` like with @Amro's. – horchler Aug 18 '13 at 03:17
  • Sorry, I made a mistake, I mean the diagonals of isSubC is equal 1, but it is not a problem, thank you! – Yevis Aug 18 '13 at 03:29
  • Your code is very nice, but I need a faster algorithm, so I choose Amro's code, the reason I think is intersect should call ismember function, so it slower than call ismember directly. Thanks again! – Yevis Aug 18 '13 at 03:48
  • @Yevis: No, in R2012a+ `intersect` calls `unique`, not `ismember`. `ismember` calls `unique` too. And note that there is only only one call to `intersect` in my code as opposed to two `ismember` calls. But only you know how fast it is on your computer (assuming that you time it properly). You question didn't indicate that you cared about speed, just simplifying the looping. If you *really* wanted speed then you would use simple `for` loops. – horchler Aug 18 '13 at 16:19
1

Building on the other answers, you can also use ismember:

sets = {[1 2 3 4 5 6], [1 3 5 7], [2 4 6 8], [1 4 6], [3 7]};

N = numel(sets);          % number of sets
idx = nchoosek(1:N,2);    % indices of combinations
subsets = false(N,N);
for i = 1:size(idx,1)
    a = idx(i,1); b = idx(i,2);

    % check that set A is a subset of B, and the other way around as well
    subsets(a,b) = all(ismember(sets{a},sets{b}));
    subsets(b,a) = all(ismember(sets{b},sets{a}));
end

We get a logical matrix:

>> subsets
subsets =
     0     0     0     0     0
     0     0     0     0     0
     0     0     0     0     0
     1     0     0     0     0
     0     1     0     0     0

where non-zeros indicate subset relationship:

>> [i,j] = find(subsets)
i =
     4
     5
j =
     1
     2

i.e c{4} is a subset of c{1}, and c{5} is a subset of c{2}

Note: it is obvious that any set is a subset of itself, so the diagonals of subsets matrix should also be made 1. You could add that if you want using:

subsets(1:N+1:end) = true;
Amro
  • 123,847
  • 25
  • 243
  • 454
  • Hi Amro, thank you very much! I have another question, may I count the number of subset of c{i} in your loop? – Yevis Aug 18 '13 at 02:14
  • I have a question, when I test your code for my c{i}, i=1,2,...,n, – Yevis Aug 18 '13 at 02:45
  • I found some of c{i} is empty, so these empty cell vectors tend to be the subset of most of c{i}, can you fix it? thank you! – Yevis Aug 18 '13 at 02:49
  • 1
    technically an [empty set](http://en.wikipedia.org/wiki/Empty_set) is a subset of any other set: `∀A: ∅⊆A`, so the result is correct. I'm not sure I understand the first question, do you mean to count the number of other sets to which `c{i}` is a subset of? – Amro Aug 18 '13 at 03:03
  • it is either the rows sums or the column sums, depending on what you mean by "how many subsets c{i} has" (I'm still now sure which one you exactly meant). So either `sum(subsets,1)` or `sum(subsets,2)` – Amro Aug 18 '13 at 03:33
  • by the way if you still want to ignore empty sets, you could detect them as: `idx = cellfun(@isempty,sets)` and use those indices to zero out the corresponding rows in `subsets` matrix: `subsets(idx,:)=false` – Amro Aug 18 '13 at 03:37
  • Thank you very much, I test the three method, your is the fastest, that's perfect! – Yevis Aug 18 '13 at 03:45
0

You can by using cellfun, but as stated here, it is not a good idea.

For doing so, with one loop:

c{1}=[1 2 3 4 5 6]; c{2}=[1 3 5 7];c{3}=[2 4 6 8];
c{4}=[1 4 6];c{5}=[3 7];
cSize = numel( c);
isect=cell(1,cSize)
for k=1:cSize
  isect{k}=cellfun(@(in) intersect(in,c{k}),c,'UniformOutput',false);
end

This procedure may be repeated to eliminate the other for:

c{1}=[1 2 3 4 5 6]; c{2}=[1 3 5 7];c{3}=[2 4 6 8];
c{4}=[1 4 6];c{5}=[3 7];
isect=cellfun(@(in) cellfun(@(in2) intersect(in,in2),c,'UniformOutput',false),c,'UniformOutput',false);

isect{i}{j} is the intersection from c{i} to {j}

Note: cellfun will do the loop internally over the cell value, so in fact, you are not removing the loops.


Although this was not the initial question, finding the subsets:

c{1}=[1 2 3 4 5 6]; c{2}=[1 3 5 7];c{3}=[2 4 6 8];
c{4}=[1 4 6];c{5}=[3 7];c{6}=[];
isSubset=cell2mat(cellfun(@(in) cellfun(@(in2) isequal(intersect(in,in2),in)|isempty(in),c),c,'UniformOutput',false)');

Results:

isSubset =

 1     0     0     0     0     0
 0     1     0     0     0     0
 0     0     1     0     0     0
 1     0     0     1     0     0
 0     1     0     0     1     0
 1     1     1     1     1     1

Which returns a boolean if k is a subset of m by doing isSubset(k,m).

Community
  • 1
  • 1
Werner
  • 2,537
  • 1
  • 26
  • 38
  • thank you Werner, you said cellfun will not removing the loops, that is mean the time complexity will not reduce with it, right? – Yevis Aug 17 '13 at 23:25
  • In addition, how can I check c{j} is the subset of c{i} from insect? – Yevis Aug 17 '13 at 23:31
  • @user2692500 As you can see in the linked reference (click in the here and check my answer), the time complexity will get in fact higher when using cellfun with anonymous function (the `@(argument) expression` syntax functions). The `cellfun` seems to be faster only with pure builtin functions. – Werner Aug 17 '13 at 23:37
  • @user2692500 I don't know if horchler solution is faster or slower than 2 loops, you could check it yourself using tic toc as it is shown on the cellfun topic linked… – Werner Aug 17 '13 at 23:38
  • @user2692500: If you're looking for speed, I highly suspect that some simple `for` loops done properly will be faster than anything we come up with here. – horchler Aug 18 '13 at 01:15
  • @user2692500 in fact my answer is wrong, a subset is when the hole subset is present at the set, not just a part. The other solutions are good enough, I just updated mine so that it doesnt mislead you or other people… dont forget to choose an accepted answer and welcome to stackoverflow x) – Werner Aug 18 '13 at 02:27
  • I found some of c{i} is empty, so these empty vectors are the subsets of most c{i}, I try to fix it, but failed, can you fix it for me? – Yevis Aug 18 '13 at 03:03
  • 1
    @user2692500 There you are… now please, you have 3 different solutions, choose one and work it out. And learn from them if you don't understand how exactly they work. – Werner Aug 18 '13 at 03:10
  • 1
    Thanks, all of your methods are very good, you are right cellfun may be not a good choice. – Yevis Aug 18 '13 at 03:26