6

accumarray uses two rows of indices to create a matrix with elements on the location of valid index pairs with a value assigned by the specified function, e.g.:

A = [11:20]; 
B = flipud([11:20]); 
C = 1:10;
datamatrix = accumarray([A B],C);

This way datamatrix will be a 20x20 matrix with values. If the values of A and B however are very large, this will result in a mostly empty matrix, with a small batch of data in the far corner. To circumvent this, one might set accumarray to issparse:

sparsedatamatrix = accumarray([A B],C,[],@sum,[],true);

This will save a lot of memory in case min(A) and/or min(B) is/are very large.

My problem, however, is that I have a Mx7 matrix, with M~1e8, on which I want to collect the means of columns three through seven based upon indexing in the first two columns and the standard deviation of the third column based upon the third as well:

result = accumarray([data(:,1) data(:,2)],data(:,3),[],@std);

I want to save this back into a table, structured as [X Y Z std R G B I], where X and Y are the indices, Z is the average height of that pixel,R, G, B and I are mean values (colour and intensity) per pixel and std is the standard deviation of heights (i.e. the roughness). Using the issparse in this case does not help, since I transform my matrices resulting from accumarray using repmat.

The point of this code is to estimate the height, roughness, colour and intensity of a piece of land from a point cloud. I rounded the coordinates in X and Y to create a grid and now need those average values per grid cell, but output as a "table" (not the MATLAB data type, but a 2D array which is not the default matrix output).

So, to conclude with the question:

Is there a way for accumarray or a similar function to output this table without intermediate (potentially very large) matrix?

Code below:

Xmax = max(Originaldata(:,1));
Ymax = max(Originaldata(:,2));
X_avg_grid=(Edgelength:Edgelength:Xmax*Edgelength)+Xorig;
TestSet = zeros(Xmax*Ymax,9);

xx = [1:length(X_avg_grid)]'; %#ok<*NBRAK>
TestSet(:,1) = repmat(xx,Ymax,1);
ll = 0:Xmax:Xmax*Ymax;
for jj = 1:Ymax
    TestSet(ll(jj)+1:ll(jj+1),2) = jj;
end

for ll = 1:7
    if ll == 2
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@std);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    elseif ll == 7
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],1);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    elseif ll == 1
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,3),[],@mean);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    else
        tempdat = accumarray([Originaldata(:,1) Originaldata(:,2)],Originaldata(:,ll+1),[],@mean);
        tempdat = reshape(tempdat,[],1);
        TestSet(:,ll+2) = tempdat;
    end
end

TestSet = TestSet(~(TestSet(:,9)==0),:);

The ninth column here is just the amount of points per cell.

Originaldata = 
19  36  2.20500360107422    31488   31488   31488   31611
20  37  2.26400360107422    33792   33792   34304   33924
20  37  2.20000360107422    33536   33536   34048   33667
19  36  2.20500360107422    34560   34560   34560   34695
20  36  2.23300360107422    32512   32512   33024   32639
21  38  2.22000360107422    31744   31488   33024   31611
21  37  2.20400360107422    32512   32768   33792   32896
21  37  2.24800360107422    29696   29440   30720   29555
21  38  2.34800360107422    32768   32768   32768   32639
21  37  2.23000360107422    33024   33024   33536   33153

So all points on the same X,Y (e.g. [19 36] or [21 37]) are averaged (height, RGB, intensity in that order) and of the values in the third column the standard deviation is also desired:

Result = 
19  36  2.2050036   0.00        33024   33024   33024       33153
21  37  2.227336934 0.02212088  31744   31744   32682.66    31868

and so forth for the rest of the data.

I updated my code to the latest version I have. This reduced memory overhead quite a bit, as the function now creates the grids one after another as opposed to all at once. However, the code is running in parallel so there are still eight simultaneous grids created, so a solution would still be appreciated.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • 1
    I'm not sure I follow. If you want to do like `accumarray` does but summing _rows_ instead of numbers, you can use (sparse) matrix multiplication as in [this Q&A](http://stackoverflow.com/questions/29906059/summing-rows-by-index-using-accumarray) – Luis Mendo Aug 24 '15 at 13:02
  • I cannot use sparse() since it does not support other functions than its native 'sum' and cannot make accumarray ouput a sparse matrix, since repmat cannot work with sparses. – Adriaan Aug 24 '15 at 13:03
  • 1
    I have a feeling that what you want _can_ be done with sparse matrix multiplication. But you'd have to create a minimal, working example with input and desired ouput; your question as it stands is not very clear – Luis Mendo Aug 24 '15 at 13:05
  • @Shai it does, ah, I see. The way I extract the X and Y coordinates (xx and the loop over 1:Ymax) does not allow for a sparse. I'd need a way to obtain the coordinates without extra memory, would 'find' work for that? – Adriaan Aug 24 '15 at 14:13
  • @A.Visser `find` should do the trick for you. – Shai Aug 24 '15 at 14:19
  • @A.Visser see [this](http://stackoverflow.com/a/32177570/1714410) answer as an example. – Shai Aug 24 '15 at 14:20
  • @Shai: "N-dimensional indexing allowed for Full matrices only." So I'd need a different form of daccum in case I want to use sparse matrices there. – Adriaan Aug 24 '15 at 14:21
  • @A.Visser you can have `daccum` as a cell array with a sparse matrix in each cell. – Shai Aug 24 '15 at 14:22
  • @Shai I think I see how I'd go about implementing that, but Im fully occupying my CPU for the next hour or so, so I'll not be able to test that in the mean time. Thanks! – Adriaan Aug 24 '15 at 14:24
  • Out of curiosity: I'd then use find for each of the seven cells. Each cell contains data on the same elements; would the output of find then follow the same row/column format? (e.g. row 3 col 4 in the first row of the output of find for all 7 sets of data) – Adriaan Aug 24 '15 at 14:26
  • @A.Visser in principle all `find`s should return the same order of `ii`s and `jj`s. **But**, if one of the entries computed by `accumarray` turns out to be zero - you will not get back the corresponding `ii` and `jj` for that specific entry. – Shai Aug 24 '15 at 14:28
  • That'll not be the case, since each observation has all variables, so if an XY pair contains one of the variables, it will contain the rest as well. – Adriaan Aug 24 '15 at 14:30
  • BTW, since you know all the indices in advance, you can convert them to linear indices (using `sub2ind`) and work with a sparse 2D matrix instead of sparse 3D. skipping the `find` and directly access only the relevant entries using the linear indices you compute in advance! – Shai Aug 24 '15 at 14:30

3 Answers3

3

A sketch of a solution using linear indices and 2D sparse matrix

lind = Originaldata(:,1) + max( Originaldata(:,1) ) * ( Originaldata(:,2) - 1 );
daccum(7,:) = accumarray( lind, 1, [], @sum, [], true ); %// start with last one to pre-allocate all daccum
daccum(1,:) = accumarray( lind, Originaldata(:,3), [], @mean, [], true );
daccum(2,:) = accumarray( lind, Originaldata(:,3), [], @std, [], true );
daccum(3,:) = accumarray( lind, Originaldata(:,4), [], @mean, [], true );
daccum(4,:) = accumarray( lind, Originaldata(:,5), [], @mean, [], true );
daccum(5,:) = accumarray( lind, Originaldata(:,6), [], @mean, [], true );
daccum(6,:) = accumarray( lind, Originaldata(:,7), [], @mean, [], true );

Now you can get only what you need

inter = [Originaldata(:,1), Originaldata(:,2), full( daccum(:,lind) )' ];
Shai
  • 111,146
  • 38
  • 238
  • 371
2

You can first use unique with the 'rows' option to find the indices of the unique pairs of X and Y coordinates, then instead use those indices as the subscript input in your calls to accumarray (you'll have to call it separately for each column, since accumarray doesn't handle matrix inputs):

[xyPairs, ~, index] = unique(Originaldata(:, 1:2), 'rows');
nPairs = max(index);
Result = [xyPairs ...
          accumarray(index, Originaldata(:, 3), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 3), [nPairs 1], @std) ...
          accumarray(index, Originaldata(:, 4), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 5), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 6), [nPairs 1], @mean) ...
          accumarray(index, Originaldata(:, 7), [nPairs 1], @mean) ...
          accumarray(index, ones(size(index)), [nPairs 1], @sum)];
gnovice
  • 125,304
  • 15
  • 256
  • 359
1

You could pre-process the data.

One thing you can achieve this way is remove undesired lines (such as those having two or less occurrences) so that you don't have to deal with 0 standard deviation:

%// Count occurences:
combined_coord = Originaldata(:,1)*1E6+Originaldata(:,2); %// "concatenating" the coords
[C,~,ic] = unique(combined_coord);
occurences = [C accumarray(ic,1)];
%// Find all points that have <=2 occurences:
coords_to_remove = occurences((occurences(:,2)<=2),1);
%// Find valid lines:
valid_lns = ~sum(bsxfun(@eq,combined_coord,coords_to_remove'),2); %'
%// Filter original data:
new_data = Originaldata(valid_lns,:);
Dev-iL
  • 23,742
  • 7
  • 57
  • 99