0

I have the following function that works perfectly, but I would like to apply vectorization to it...

for i = 1:size(centroids,1)
    centroids(i, :) = mean(X(idx == i, :));
end

It checks if idx matches the current index and if it does, it calculates the mean value for all the X values that correspond to that index.

This is my attempt at vectorization, my solution does not work and I know why...

centroids = mean(X(idx == [1:size(centroids,1)], :));

The following idx == [1:size(centroids,1)] breaks the code. I have no idea how to check if idx equals to either of the numbers from 1 to size(centroids,1).

tl:dr

Get rid of the for loop through vectorization

buydadip
  • 8,890
  • 22
  • 79
  • 154
  • Can you give an example of your X vector – ammportal May 08 '17 at 07:03
  • 1
    Actually, the for loop is redundant here. The value of idx will be equal to i only once. You can just use this line to accomplish that `centroids(idx, :) = mean(X(idx, :));`. However, in case you want to do something different, which by the way is not clear from your question, you should provide an example of values of `X` and the desired output for `centroids` – ammportal May 08 '17 at 07:10

2 Answers2

3

One option is to use arrayfun;

nIdx      = size(centroids,1);
centroids = arrayfun(@(ii) mean(X(idx==ii,:)),1:nIdx, 'UniformOutput', false);
centroids = vertcat(centroids{:})

Since the output of a single function call is not necessarily a scalar, the UniformOutput option has to be set to false. Thus, arrayfun returns a cell array and you need to vertcat it to get the desired double array.

souty
  • 607
  • 3
  • 10
  • `arrayfun` has more or less the same performance as an explicit loop; it doesn't really count as vectorization – Luis Mendo May 08 '17 at 10:57
  • Well, arrayfun is a built-in function, i.e. pre-compiled. In my experience it gives you better performance than an explicit loop. – souty May 08 '17 at 13:05
  • It depends on the specific case and on the Matlab version, but generally `arrayfun` [is](https://es.mathworks.com/matlabcentral/answers/324130-is-arrayfun-faster-much-more-than-for-loop) (or [used](http://es.mathworks.com/matlabcentral/newsreader/view_thread/260563) to be) [slower](http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why) – Luis Mendo May 08 '17 at 13:14
2

you can split the matrix into cells and take the mean from each cell using cellfun (which applies a loop in its inner operation):

generate data:

dim = 10;
N = 400;
nc = 20;
idx = randi(nc,[N 1]);
X = rand(N,dim);
centroids = zeros(nc,dim);

mean using loop (the question's method)

for i = 1:size(centroids,1)
    centroids(i, :) = mean(X(idx == i, :));
end

vectorizing:

% split X into cells by idx
A = accumarray(idx, (1:N)', [nc,1], @(i) {X(i,:)});
% mean of each cell
C = cell2mat(cellfun(@(x) mean(x,1),A,'UniformOutput',0));

maximum absolute error between the methods:

max(abs(C(:) - centroids(:))) % about 1e-16
user2999345
  • 4,195
  • 1
  • 13
  • 20