4

I have a cell array. Each cell contains a vector of variable length. For example:

example_cell_array=cellfun(@(x)x.*rand([length(x),1]),cellfun(@(x)ones(x,1), num2cell(ceil(10.*rand([7,4]))), 'UniformOutput', false), 'UniformOutput', false)

I need to concatenate the contents of the cells down through one dimension then perform an operation on each concatenated vector generating scalar for each column in my cell array (like sum() for example - the actual operation is complex, time consuming, and not naturally vectorisable - especially for diffent length vecotrs).

I can do this with loops easily (for my concatenated vector sum example) as follows:

[M N]=size(example_cell_array);
result=zeros(1,N);
cat_cell_array=cell(1,N);
for n=1:N
    cat_cell_array{n}=[];
    for m=1:M
        cat_cell_array{n}=[cat_cell_array{n};example_cell_array{m,n}];
    end
end
result=cell2mat(cellfun(@(x)sum(x), cat_cell_array, 'UniformOutput', false))

Unfortunately this is WAY too slow. (My cell array is 1Mx5 with vectors in each cell ranging in length from 100-200)

Is there a simple way to produce the concatenated cell array where the vectors contained in the cells have been concatenated down one dimension?

Something like:

dim=1;
cat_cell_array=(?concatcells?(dim,example_cell_array);

Edit: Since so many people have been testing the solutions: Just FYI, the function I'm applying to each concatenated vector is circ_kappa(x) available from Circular Statistics Toolbox

Divakar
  • 218,885
  • 19
  • 262
  • 358
Mr Purple
  • 2,325
  • 1
  • 18
  • 15
  • Does the cell array contain only numeric data in each cell? – Divakar Nov 03 '14 at 05:20
  • Each cell contains a vector of variable length (width 1) – Mr Purple Nov 03 '14 at 05:51
  • Also, on the `operation` part, just because they have different length vectors, doesn't necessarily mean that its not vectorizable. Few examples I came across on this - [Ex. 1](http://stackoverflow.com/questions/26065756/count-unique-rows-in-a-cell-full-of-vectors/26066092#26066092), [Ex. 2](http://stackoverflow.com/questions/25851305/fastest-way-of-finding-repeated-values-in-different-cell-arrays-of-different-siz/25851816#25851816) – Divakar Nov 03 '14 at 05:53
  • And, I would think [Andrew's solution](http://stackoverflow.com/a/26707501/3293881) posted here to be quite fast. – Divakar Nov 03 '14 at 05:55
  • @MrPurple I did some tests. My and Andrew's solution produces same results but the latter is faster. – Autonomous Nov 03 '14 at 06:32

4 Answers4

2

For the concatenation itself, it sounds like you might want the functional form of cat:

for n=1:N
    cat_cell_array{n} = cat(1, example_cell_array{:,n});
end

This will concatenate all the arrays in the cells in each column in the original input array.

Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • I think it's worth commenting that this is easier to follow than @Nosratinia's "arrayfun" wrapped version, and the speed loss was less that 1%. Also it may be more directly usable if people exchange "N" for "size(example_cell_array,2)". Then they dont have to also include my poor example to generate N. – Mr Purple Nov 04 '14 at 19:26
2

Some approaches might suggest you to unpack the numeric data from example_cell_array using {..} and then after concatenation pack it back into bigger sized cells to form your cat_cell_array. Then, again you need to unpack numeric data from that concatenated cell array to perform your operation on each cell.

Now, in my view, this multiple unpacking and packing approaches won't be efficient ones if example_cell_array isn't one of your intended outputs. So, considering all these, let me suggest two approaches here.


Loopy approach

The first one is a for-loop code -

data1 =  vertcat(example_cell_array{:}); %// extract all numeric data for once
starts = [1 sum(cellfun('length',example_cell_array),1)]; %// intervals lengths
idx = cumsum(starts); %// get indices to work on intervals basis
result  = zeros(1,size(example_cell_array,2)); 
%// replace this with "result(size(example_cell_array,2))=0;" for performance
for k1 = 1:numel(idx)-1
    result(k1) = sum(data1(idx(k1):idx(k1+1)-1));
end

So, you need to edit sum with your actual operation.


Almost-vectorized approach

If example_cell_array has a lot of columns, my second suggestion would be an almost vectorized approach, though it doesn't perform badly either with a small number of columns. Now this code uses cellfun at the first line to get the lengths for each cell in concatenated version. cellfun is basically a wrapper to a loop code, but this is not very expensive in terms of runtime and that's why I categorized this approach as an almost vectorized one.

The code would be -

lens = sum(cellfun('length',example_cell_array),1); %// intervals lengths
maxlens = max(lens);
numlens = numel(lens);
array1(maxlens,numlens)=0;
array1(bsxfun(@ge,lens,[1:maxlens]')) = vertcat(example_cell_array{:}); %//'
result = sum(array1,1);

The thing you need to do now, is to make your operation run on column basis with array1 using the mask created by the bsxfun implementation. Thus, if array1 is a M x 5 sized array, you need to select the valid elements from each column using the mask and then do the operation on those elements. Let me know if you need more info on the masking issue.

Hope one of these approaches would work for you!


Quick Tests: Using a 250000x5 sized example_cell_array, quick tests show that both these approaches for the sum operation perform very well and give about 400x speedup over the code in the question at my end.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Took a lot longer to assess. But the loopy method here was by far the fastest for the reason stated: I had to run a function on each of the 5 columns and didnt actually need the concatenated array. – Mr Purple Nov 05 '14 at 02:06
  • It's worth noting here that by forcing me to assess the orientation of the vectors contained in the cell aray I had due to vertcat failing in my real data. I found that I did NOT have my data arranged correctly for vertcat and had to transform it for the 'loopy' method. using cellfun(@(C)ctranspose(C),(example_cell_array), 'UniformOutput', false); The resulting concatenated and vertical vectors then processed through my function orders of magnitude faster. Thanks – Mr Purple Nov 05 '14 at 02:09
  • @MrPurple Good to see this finally getting through to you! Well the problem stated `rand([length(x),1])` so I assumed `N x 1` sized arrays in each cell, but it seems you have `1 x N`. So, actually you can just use `horzcat` instead of `vertcat` and it would speedup the code much better than with `cellfun(@(C)ctranspose(C)..`. Give that a try? Also, at my end with such `1M` cases I was getting such whooping speedups of `400x`. So, did you get any figure on the magnitude of speedups? I am kinda excited to hear about those :) – Divakar Nov 05 '14 at 04:15
  • ~20 seconds for the process in the loop vs. 24hrs+ for the horizontal vectors processed after concatenation using the cell2mat wrapped function I proposed. NB: there was also a ~5% saving in memory after concatenating to vertical vectors. Significant when you consider my array was about 6GB. So in addition to the speed up from processing the data in place, I guess it's pretty important to get your data into vertical arrays for processing if at all possible. NB2: the transpose function was very fast ~1 or 2 seconds - even on my 6GB variable. – Mr Purple Nov 05 '14 at 07:08
  • 1
    @MrPurple Wow! That's like magic :) Yeah I guess indexing into vertically shaped arrays does help. I think this will go into my profile for the help of future readers. Appreciate you sharing the figures! – Divakar Nov 05 '14 at 07:13
  • @MrPurple So, are you saying `cellfun(@(C)ctranspose(C),(example_cell_array), 'UniformOutput', false);` seemed faster than `horzcat(example_cell_array{:})`, assuming `horzcat` version worked for you? – Divakar Nov 05 '14 at 07:16
1

You can define a function like this:

cellcat = @(C) arrayfun(@(k) cat(1, C{:, k}), 1:size(C,2), 'uni', 0);

And then just use

>> cellcat(example_cell_array)
ans = 
    [42x1 double]    [53x1 double]    [51x1 double]    [47x1 double]
Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52
0

I think you are looking to generate cat_cell_array without using for loops. If so, you can do it as follows:

cat_cell_array=cellfun(@(x) cell2mat(x),num2cell(example_cell_array,1),'UniformOutput',false);

The above line can replace your entire for loop according to me. Then you can calculate your complex function over this cat_cell_array.

If only result is important to you and you do not want to store cat_cell_array, then you can do everything in a single line (not recommended for readability):

result=cell2mat(cellfun(@(x)sum(x), cellfun(@(x) cell2mat(x),num2cell(example_cell_array,1),'Uni',false), 'Uni', false)); 
Autonomous
  • 8,935
  • 1
  • 38
  • 77