Finding sums of array elements that exceed a given value?

Question

I am trying to take advantage of vectorization in MATLAB for this, but I might have to resort to for loops. I really don't want to do that! Time to learn algorithms.

Given this (11-by-3) array:

x = [...
4.9000   -0.1000   -5.1000
4.6000   -0.4000   -5.4000
3.0000   -2.0000   -7.0000
2.9000   -2.1000   -7.1000
2.9000   -2.1000   -7.1000
2.9000   -2.1000   -7.1000
2.8000   -2.2000   -7.2000
2.7000   -2.3000   -7.3000
2.7000   -2.3000   -7.3000
2.2000   -2.8000   -7.8000
1.8000   -3.2000   -8.2000
];

I want to find all of the 3^11 = 177147 possible sums of 11 elements in the array, where each of the 11 elements comes from a different row. I want to then store the sums that exceed a threshold value of 16.0, along with the 11 elements that make up each of those sums, in a (12-by-?) array.

Any ideas to get me started? Thanks for the help.

score 2 · Accepted Answer · edited Sep 27 '17 at 19:00

Here's how to do it in a vectorized way:

TR = 16;

sets = num2cell(single(x),2);

c = cell(1, numel(sets));
[c{:}] = ndgrid( sets{:} );
cartProd = cell2mat( cellfun(@(v)v(:), c, 'UniformOutput',false) );

validRows = cartProd(sum(cartProd,2) > TR,:); % output is [353x11]

Notice how I use single to save space and make the computation slightly faster.

The above solution is an adaptation of this answer.

Upon further contemplation, I think I've come up with a way that should be both faster and more memory efficient. We do this by indexing x, and then doing the previous process on the indices. Why is this better, you might ask? This is because we can store the indices as uint8, which consumes considerably less memory than double or even single. Also we get to keep the full double precision of x. Thus:

function validRows = q42933114(x,thresh)
%% Input handling
if nargin < 2
  thresh = 16;
end
if nargin < 1
  x = [...
    4.9000   -0.1000   -5.1000
    4.6000   -0.4000   -5.4000
    3.0000   -2.0000   -7.0000
    2.9000   -2.1000   -7.1000
    2.9000   -2.1000   -7.1000
    2.9000   -2.1000   -7.1000
    2.8000   -2.2000   -7.2000
    2.7000   -2.3000   -7.3000
    2.7000   -2.3000   -7.3000
    2.2000   -2.8000   -7.8000
    1.8000   -3.2000   -8.2000
  ];
end

I = reshape(uint8(1:numel(x)),size(x));

sets = num2cell(I,2);

c = cell(1, numel(sets));
[c{:}] = ndgrid( sets{:} );
cartProd = cell2mat( cellfun(@(v)v(:), c, 'UniformOutput',false) );
validRows = x(cartProd(sum(x(cartProd),2) > thresh,:));

Memory consumption comparison:

Method 1 (old):

>> whos
  Name                Size              Bytes  Class     Attributes

  c                   1x11            7795700  cell                
  cartProd       177147x11            7794468  single              
  sets               11x1                1364  cell                
  validRows         353x11              15532  single

Method 2 (new):

>> whos
  Name                Size              Bytes  Class     Attributes

  c                   1x11            1949849  cell                
  cartProd       177147x11            1948617  uint8               
  sets               11x1                1265  cell                
  validRows         353x11              31064  double

We see that the memory consumption is indeed smaller (in about 4 times), as expected.

Runtime comparison:

Method 1 -- 0.0110
Method 2 -- 0.0186

Here we see that the 2^nd method is actually a bit slower. When profiling this, we see that the cause is x(...) which is relatively expensive.

Ah, thanks! I was working on a solution too, and I got it just now, albeit much less efficiently. I ended up using `ndgrid` and `spalloc`. — abscissa, Mar 21 '17 at 17:12
@abscissa you might want to post your solution as another answer. — Dev-iL, Mar 21 '17 at 17:15

score 0 · Answer 2 · answered Mar 21 '17 at 17:22

I did it this way. There is obviously room for improvement in the variable names.

Notice there are 353 matching rows, which agrees with the answer from @Dev-iL.

p = 11;
[a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11] = ...
    ndgrid(x(1,:),x(2,:),x(3,:),x(4,:),x(5,:),x(6,:),x(7,:),x(8,:),x(9,:),x(10,:),x(11,:));
a = a1+a2+a3+a4+a5+a6+a7+a8+a9+a10+a11;
y = spalloc(p+1,3^p,(p+1)*3^p);
for i = 1:3^p
    if a(i) >= 16.1
        y(:,i) = [a1(i),a2(i),a3(i),a4(i),a5(i),a6(i),a7(i),a8(i),a9(i),a10(i),a11(i),a(i)];
    end
end
nnz(y(p+1,:)); % 353 rows matching the criteria

score -1 · Answer 3 · answered Mar 21 '17 at 16:43

I don't think you'll have better luck than using a for loop. There could be a Matlab function for generating all 3^11 combinations, and use that as a sort of index, but you'd have a lot of memory consumption that way.

The code would be hard to read, as well.

However, recent versions of Matlab don't behave that badly wrt for-loops, because the they JIT the code. Before it was just interpreted, or JIT-ing was used for specific purposes. You wouldn't want to reimplement the matrix routines in Matlab because of this, but for simple code like this one it should perform well.

Finding sums of array elements that exceed a given value?

3 Answers3

Memory consumption comparison:

Runtime comparison: