5

I have an array:

a = [109, 894, 566, 453, 342, 25]

and another cell array of sub-indices of a, denoted as:

subs = { [1,3,4], [2,5,6], [1,3], [3,4], [2,3,4], [6] };    

I want to avoid the for-loop to calculate the following summations via MATLAB:

for i=1:6
    sums_a(i) = sum(a(subs{i}));
end

Is there any fast way such as arrayfun to implement this? Thanks.

Shai
  • 111,146
  • 38
  • 238
  • 371
John Smith
  • 617
  • 3
  • 16
  • 1
    I fixed the curly brackets in the assignment of `l` (note: `l` is about the worst variable name out there). – Jonas May 26 '13 at 08:20
  • 2
    @JohnSmith Why do you want to avoid a loop? `arrayfun`/`cellfun` is usually slower than a loop. – Eitan T May 26 '13 at 08:37
  • Oh, really? I have been told that `arrayfun` would be more efficient, but I'm not sure. – John Smith May 26 '13 at 08:44
  • @JohnSmith - see [this question](http://stackoverflow.com/questions/16143314/matlab-arrayfun-cellfun-spfun-and-structfun-vs-simple-for-loop) discussing loops and *-`fun` for iterating over arrays/structs and cells. – Shai May 26 '13 at 11:53
  • FYI, if you want speed, all of the answers so far are considerably slower than the simple `for` loop you provided (it's more than four times faster than the "tricky" solution offered by @EitanT). Don't [be paranoid about `for` loops](http://www.matlabtips.com/matlab-is-no-longer-slow-at-for-loops/), especially small simple ones that have no branching (`if` statements). Ten years ago maybe, but today many times a `for` loop will be both convenient and the fastest solution. In terms of code readability `for` loops are straightforward. – horchler May 26 '13 at 21:43
  • @horchler Interesting, I've wrongly underestimated the for loop here :) However, I only get a twofold speedup on my machine (perhaps it was x4 faster because you didn't clear the variable in the timing loop and let JIT skew the results?) Anyway, I've updated my answer following your comment (if you don't mind). – Eitan T May 26 '13 at 23:55
  • @EitanT: You are correct -I had omitted the `clear`. However, I did so from all cases (effectively) and used preallocation each time in the `for` loop, i.e., `sums_a3(6,1) = 0;`. (I also did warmup before starting timing.) You're including in your timing how long it takes to clear a vector. If you look closely (move the `tic` and `toc` function inside the loop) I think you'll find that clearing the variables is not an insignificant portion of your times (more so for smaller times). – horchler May 27 '13 at 00:43
  • @horchler A more accurate timing would've indeed involved moving the `tic` and `toc` inside the loops and adding up the intervals between them. However, I clear the variables in my benchmark, so all timings should be biased by more or less the same value, I think I'm fine with that. Don't get me wrong, that doesn't take away from your comment about the loop :) – Eitan T May 27 '13 at 00:49
  • Precisely. Measuring the `clear` by itself, I see that it takes about 1/3 of the total time in the `for` loop case on my machine (1000 iterations), hence the discrepancy. – horchler May 27 '13 at 00:52
  • @horchler Sometimes it's quite surprising how and why JIT-accelerated loops are faster than vectorized code, and vice versa. It is always good practice to try both approaches and see what's faster. Can't wait until the `*fun` functions become accelerated too. – Eitan T May 27 '13 at 00:57

2 Answers2

8

use cellfun

sums_a = cellfun( @(sx) sum( a(sx) ), subs );

PS,
It is best not to use i and j as variable names in Matlab.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
2

If you're looking for speed, arrayfun can be rather slow. As commented by Andrew Horchler, in latest releases of MATLAB for loops can be pretty fast thanks to JIT acceleration. If you still insist on avoiding the loop, here's a tricky solution without for loops that employs accumarray:

idx = cumsum(cellfun('length', subs));
x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
x = sum(bsxfun(@times, x', 1:numel(subs)), 2);  %'// Produce subscripts
y = a([subs{:}]);                               % // Obtain values
sums_a = accumarray(x, y);                      % // Accumulate values

This can be written as a (rather long) one-liner actually, but it's split into several lines for clarity.

Explanation

This values to be accumulated are obtained like so:

y = a([subs{:}]);

In your example their corresponding indices should be:

1    1    1    2    2    2    3    3    4    4    5    5    5    6

That is:

  1. The first 3 values from y are accumulated and the result is stored as the first element in the output.
  2. The next 3 values are accumulated and the result is stored as the second element in the output.
  3. The next 2 values are accumulated and the result is stored as the third element in the output.

and so on...

The following lines do the magic to produce such a vector of indices x:

idx = cumsum(cellfun('length', subs));
x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
x = sum(bsxfun(@times, x', 1:numel(subs)), 2);

Lastly, x and y are fed into accumarray:

sums_a = accumarray(x, y);

and voila.

Benchmark

Here's the benchmarking code:

a = [109,894,566,453,342,25];
subs = {[1,3,4], [2,5,6], [1,3], [3,4], [2,3,4], 6};

% // Solution with arrayfun
tic
for k = 1:1000
    clear sums_a1
    sums_a1 = cellfun( @(subs) sum( a(subs) ), subs );
end
toc

% // Solution with accumarray
tic
for k = 1:1000
    clear sums_a2
    idx = cumsum(cellfun('length', subs));
    x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
    x = sum(bsxfun(@times, x', 1:numel(subs)), 2);
    sums_a2 = accumarray(x, a([subs{:}]));
end
toc

%'// Solution with for loop
tic
for k = 1:1000
    clear sums_a3
    for n = 1:6
        sums_a3(n) = sum(a(subs{n}));
    end
end
toc

The results on my machine are:

Elapsed time is 0.027746 seconds.
Elapsed time is 0.004228 seconds.
Elapsed time is 0.001959 seconds.

There's an almost tenfold speedup for accumarray vs. arrayfun, but notice that the for loop still beats both.

Community
  • 1
  • 1
Eitan T
  • 32,660
  • 14
  • 72
  • 109