7

I am assuming that the mean fucntion takes a matrix and calculate its mean by suming all element of the array, and divide it by the total number of element.

However, I am using this functionality to calculate the mean of my matrix. Then I come across a point where I don't want the mean function to consider the 0 elements of my matrix. Specifically, my matrix is 1x100000 array, and that maybe 1/3 to 1/2 of its element is all 0. If that is the case, can I replace the 0 element with NULL so that the matlab wouldn't consider them in calculating the mean? What else can I do?

Shai
  • 111,146
  • 38
  • 238
  • 371
kuku
  • 281
  • 2
  • 5
  • 12
  • 4
    Replace those zeros with NaNs : `mat(mat==0)=NaN`. Then use `nanmean`. Be careful if you are dealing with floating point numbers though, in that case you some tolerance value. If you manage to solve it, consider posting an answer on this. – Divakar Oct 29 '14 at 06:34
  • ok, let me try that. – kuku Oct 29 '14 at 06:34
  • What do you mean by tolerance value with floating number? because what I am dealing with is floating number. – kuku Oct 29 '14 at 06:38
  • Look [here](http://stackoverflow.com/questions/686439/why-is-24-0000-not-equal-to-24-0000-in-matlab) for that issue. So a quick solution could be with `mat(abs(mat) – Divakar Oct 29 '14 at 06:39
  • 1
    @kuku Divakar means be careful that your values of `0` are precisely `0` and not just a very tiny number that appears to be zero like `0.00000000000000000000000001` as this will not be found when doing `==0` and hence rather use something like `mat(abs(mat) < tol)` where `tol` is a tiny number – Dan Oct 29 '14 at 06:40
  • 1
    Also, if you are only dealing with 1D arrays, Dan's solution should work. If you are working with 2D or multidimensional arrays, `nanmean` could be elegant, as with it you can specify the dimension you would like to work with, just like you can do with standard `mean`. – Divakar Oct 29 '14 at 06:47
  • @kuku you are aware that `mean` on a 2D matrix will not return a single number but a vector? – Shai Oct 29 '14 at 06:59
  • @Divakar I have never been a fan of manipulating data like this. Memory reuse is completely ok, but only if the old data is not supposed to be used anymore. Better to create a function for the specific purpose. Inside there all is safe, nothing is returned unless defined as output. The other thing is of course that other editors of the same code may only read parts of the code and may miss vital parts of the data manipulation. If this manipulation is encapsulated in the myMean function, this will not be a risk. This will be a little slower (but performance problems may need special solutions) – patrik Oct 29 '14 at 12:18
  • @patrik Of course, agreed on that. If data is to be re-used, it's always a safer and elegant method to have them function-encapsulated. – Divakar Oct 29 '14 at 12:26

3 Answers3

9

Short version:
Use nonzeros:

mean( nonzeros(M) );

A longer answer:
If you are working with an array with 100K entries, with a significant amount of these entries are 0, you might consider working with sparse representation. It might also be worth considering storing it as a column vector, rather than a row vector.

sM = sparse(M(:)); %// sparse column
mean( nonzeros(sM) ); %// mean of only non-zeros
mean( sM ); %// mean including zeros
Shai
  • 111,146
  • 38
  • 238
  • 371
5

As you were asking "What else can I do?", here comes another approach, which does not depend on the statistics Toolbox or any other Toolbox.

You can compute them mean yourself by summing up the values and dividing by the number of nonzero elements (nnz()). Since summing up zeros does not affect the sum, this will give the desired result. For a 1-dimensional case, as you seem to have it, this can be done as follows:

% // 1 dimensional case
M = [1, 1, 0 4];
sum(M)/nnz(M) % // 6/3 = 2

For a 2-dimensional case (or n-dimensional case) you have to specify the dimension along which the summation should happen

% // 2-dimensional case (or n-dimensional)
M = [1, 1, 0, 4
      2, 2, 4, 0
      0, 0, 0, 1];

% // column means of nonzero elements      
mean_col = sum(M, 1)./sum(M~=0, 1) % // [1.5, 1.5, 4, 2.5]

% // row means of nonzero elements
mean_row = sum(M, 2)./sum(M~=0, 2) % // [2; 2.667; 1.0]
Nras
  • 4,251
  • 3
  • 25
  • 37
3

To find the mean of only the non-zero elements, use logical indexing to extract the non-zero elements and then call mean on those:

mean(M(M~=0))
Dan
  • 45,079
  • 17
  • 88
  • 157