3

I have two cell arrays of strings of varying lengths, d={'nerve','body','muscle','bone'} and e={'body','body','muscle'}. I have to compare these two arrays and count the occurrences of each string in e in d. The expected result should be a vector, count_string=(0,2,1,0). The following is the code I've written but I get the error:Cell contents assignment to a non-cell array object. I am a beginner in matlab programming. Any quick help on this is greatly appreciated.

count_string=size(d)
for i=1:length(d)    
count_string{i}=sum(ismember(e{i},d));
end

After all your below suggestions, this is the module i have.

for i=1:length(d_union)
count_string1=cellfun(@(x) sum(ismember(d1,x)), d_union);
count_string2=cellfun(@(x) sum(ismember(d2,x)), d_union);
count_string3=cellfun(@(x) sum(ismember(d3,x)), d_union);
count_string4=cellfun(@(x) sum(ismember(d4,x)), d_union);
count_string5=cellfun(@(x) sum(ismember(d5,x)), d_union);
count_string6=cellfun(@(x) sum(ismember(d6,x)), d_union);
count_string7=cellfun(@(x) sum(ismember(d7,x)), d_union);
count_string8=cellfun(@(x) sum(ismember(d8,x)), d_union);
count_string9=cellfun(@(x) sum(ismember(d9,x)), d_union);
count_string10=cellfun(@(x) sum(ismember(d10,x)), d_union);
count_string11=cellfun(@(x) sum(ismember(d11,x)), d_union);
count_string12=cellfun(@(x) sum(ismember(d12,x)), d_union);
count_string13=cellfun(@(x) sum(ismember(d13,x)), d_union);
count_string14=cellfun(@(x) sum(ismember(testdoc,x)), d_union);    end   

My matlab compiler is taking forever to execute this module. 'd_union' is a 1x1216 cell array and each of the d1 to testdoc is approximately 1x240 cell array. I gotta calculate the cosine similarity of the vectors I get from the above operation. Is there a way to speed up the process? Please help. Thank you.

user1222437
  • 31
  • 1
  • 3
  • Do you *have* to use strings? It looks like the number of possible strings you have is quite low; couldn't you just replace each string with a number? That would probably speed things up a lot! – Jonas Heidelberg Feb 21 '12 at 18:37
  • Well, I have to use strings. I'm reading from a text file that has several document paragraphs, d1 to d13. I have to use these to perform other calculations. So, I'm not sure replacing each string with a number would work fine for me. Is there any other method? – user1222437 Feb 21 '12 at 18:50
  • You don't need the for-loop, `cellfun` takes care of that. You just running the same code multiple time with the same result. – yuk Feb 21 '12 at 20:03

3 Answers3

3

You can count occurrences of strings from d in e like this:

count_string = cellfun(@(x) sum(ismember(e,x)), d);

For your sample data you will get vector [0 2 1 0];

Does the d array contain only unique strings?

UPDATE:

Here is another method with temporary converting strings to numbers with GRP2IDX and counting them with HISTC. It assumes all strings in e are also exist in d.

[gi g] = grp2idx([d e]);
gn = histc(gi(numel(d)+1:end),1:numel(g));

g will contain the unique strings (probably will be identical to d) and gn will be the counts. gi is temporary numerical array used for counting.

You need Statistical Toolbox to access GRP2IDX function.

yuk
  • 19,098
  • 13
  • 68
  • 99
  • This works perfectly well! Thanks a lot. Yes, d array contains only unique strings. Also, I need to perform this function on 13 other cell arrays of strings each approximately 1x140. Is there a way to speed up the process? I have to calculate the cosine similarity as well for the 13 vectors I get out of the above operation. My matlab compiler is taking forever to execute the code. Can you please help? – user1222437 Feb 21 '12 at 18:20
  • The grp2idx function saved me!I did not know about this before. Another new learning in matlab. Thanks for your help. – user1222437 Feb 22 '12 at 23:06
  • 1
    If the answer is helpful consider upvoting and/or accepting it. – yuk Feb 22 '12 at 23:15
2

Start with

count_string = cell(1,size(d));  

And you are indexing into e, but controlling the loop on the size of d.

for i=1:length(d)
   count_string{i}=sum(ismember(d{i},e));
end
yuk
  • 19,098
  • 13
  • 68
  • 99
macduff
  • 4,655
  • 18
  • 29
  • I don't have matlab with me right now, but I think this should get you headed in the right direction. – macduff Feb 21 '12 at 03:45
  • I tried the above code and 'count_string' displays 1 for each occurrence of the string in e, in d. But how do i count the no.of occurrences of each string? As in, if 'body' appears twice and 'muscle' once in e, then count_string should display (0,2,1,0). Thanks for your help. – user1222437 Feb 21 '12 at 04:02
  • I just formatted the answer, didn't changed the code. The first line should be `count_string = zeros(size(d));`. In for-loop: `count_string(i) = ...`. And `d` and `e` should be switched as in the question. – yuk Feb 21 '12 at 04:37
0

With regard to the errors with your original for-loop solution, macduff already mentioned them:

  • You need to initialize count_string using either CELL or ZEROS:

    count_string = cell(size(d));   %# A 1-by-4 cell array
    %# OR
    count_string = zeros(size(d));  %# A 1-by-4 numeric array
    
  • When adding values to count_string, you should use ()-indexing for numeric arrays and {}-indexing for cell arrays.

  • You need to swap d and e in your call to ISMEMBER, and index d with the loop variable instead of e.

Regarding alternatives to using a loop, yuk gave you one solution using CELLFUN. Another vectorized solution is to use a combination of ISMEMBER and ACCUMARRAY:

>> [~, index] = ismember(e,d);  %# Find where each entry in e occurs in d
>> count_string = accumarray(index.', 1, [numel(d) 1]).'  %# Accumulate indices

count_string =

     0     2     1     0
Community
  • 1
  • 1
gnovice
  • 125,304
  • 15
  • 256
  • 359