2

I want to compute the average value of elements in each column that bigger than zero in matrix that defined as:

G =

     1     2     3     0     9     4
     0     1     3     4     0     0

If a element is zerro, we will ignore it and does not consider in average values. My expected result is

MeanG= 1/1  3/2  6/2  4/1  9/1 4/1

How to do it by matlab code?

rayryeng
  • 102,964
  • 22
  • 184
  • 193
John
  • 2,838
  • 7
  • 36
  • 65
  • 1
    Important question: does this matrix only contain integers, or does it contain floating-point numbers? – jub0bs Dec 05 '14 at 16:02

3 Answers3

5

For a rather simple solution, if you have the Statistics Toolbox, simply replace all zeros and negative values with NaN then use nanmean.

Therefore:

>> Gnan = G;
>> Gnan(Gnan <= 0) = NaN;
>> out = nanmean(Gnan)

out =

    1.0000    1.5000    3.0000    4.0000    9.0000    4.0000

I made a copy of G because I'm assuming that you want to keep the original version of G for any further analysis other than computing for the mean this way.


If you don't have access to nanmean, what you can do is take a look at each column and determine how many zeroes and negatives there are. Once you do this, simply sum up all values within each column that are not zero and negative, and divide by the total of number of values that are not zero and negative in each column. Something like:

>> zero_neg = G <= 0;
>> Gcopy = G;
>> Gcopy(zero_neg) = 0;
>> out = sum(Gcopy) ./ (size(G,1) - sum(zero_neg))

out =

    1.0000    1.5000    3.0000    4.0000    9.0000    4.0000

The intricacy here is that we search for those elements that are zero or negative, then make a copy of G and set these elements in this copy to zero so that these entries don't get added into the sum. You have to account for the correct mean by dividing by the total number of entries that are not zero or negative (or just positive actually... see Nras's post).

Note that I'm also keeping a copy of G and mutating this copy to compute our mean as I'm assuming you'll want to keep the original version of G for further analysis.

Minor Note

Jubobs made a very good point. If this matrix contains floating point numbers, it's very dangerous to compare with such a definite number like 0 due to precision and accuracy. For example, if this were a matrix where there are elements that you expect to be zero, but they aren't due to floating point imprecision, then this will not accurately calculate the mean you desire. Take a look at this post and this great answer by @gnovice for more details: Why is 24.0000 not equal to 24.0000 in MATLAB?

Community
  • 1
  • 1
rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • Thank rayryeng. My matrix contain float point. However, don't worry. I can use the threshold to filter it. So elements small than threshold will be zero. – John Dec 05 '14 at 17:50
  • @user8264 - OK, then it looks like you're set then! Good luck! – rayryeng Dec 05 '14 at 17:51
  • Nice approaches both, and the note is also very relevant here – Luis Mendo Dec 05 '14 at 18:27
  • @LuisMendo - Graçias :) – rayryeng Dec 05 '14 at 18:56
  • 1
    @Kamtal - Thanks!... it does require the use of the Statistics Toolbox though, so using the second approach is more reproducible. Also, it can be simplified if you just search for all entries that are strictly larger than 0, and counting how many of those are around to compute the mean... speaking of which, I found a mistake. I'll update my post! – rayryeng Dec 05 '14 at 18:58
  • @rayryeng ¡De nada! ;-) – Luis Mendo Dec 05 '14 at 18:59
  • @user8264 - If you no longer need help, consider accepting one of our answers. Please and thank you. – rayryeng Dec 10 '14 at 19:06
3

The default solution without Toolbox-Dependency would probably read:

G(G<0) = 0; % // not needed if G contains only positive numbers as in your example
sum(G, 1)./sum(G~=0, 1)

ans =

1.0000    1.5000    3.0000    4.0000    9.0000    4.0000

We sum up manually but only divide by the number of non-zero elements. In order to also sum over the correct dimension for 1-column arrays, one should also specify the dimension.

Please note that this approach probably fails for columns, which only contain zeros (or negative values)

Nras
  • 4,251
  • 3
  • 25
  • 37
1

there are many ways to do it, you can find the number of zeros in your matrix and remove it when you are calculating the mean

z = size(A(find(A<0)))

if your other numbers are all positive, you can directly do something like

mean = sum(G)./size(G(find(G>0)),1)
rayryeng
  • 102,964
  • 22
  • 184
  • 193
GameOfThrows
  • 4,510
  • 2
  • 27
  • 44
  • I tried the second suggestion. It doesn't give the desired output. – kkuilla Dec 05 '14 at 16:12
  • Do you have negative values in your Matrix? If you do, your mean would be wrong as the G>0 will not count those cells or you can try G~=0 instead of G>0; Oh sorry, didn't read properly, you asked for mean of each column! The fastest method is sum(G, 1)./sum(G~=0, 1), the 1 means doing the operation sum by column, if you want rows, its 2. – GameOfThrows Dec 05 '14 at 16:16
  • I tried with the input the OP suggested and I expect the expected result. – kkuilla Dec 05 '14 at 16:17
  • 1
    It doesn't work because you forgot the `dot` (`.`) operator. What you are doing is trying to find the solution of linear equations by using `rdivide`. You probably meant to do: `sum(G)./size(G(find(G>0)),1)`. BTW, you should also rename `mean` as `mean` is an actual function in MATLAB. You will unintentionally shadow over that function, so if you tried to use the actual function `mean` later, it won't work as it is now a variable. You have it right in your comment to kkuilla, but it isn't fixed in your actual answer. – rayryeng Dec 05 '14 at 16:19