0

I am calculating the mean of each row in a data frame, but even when all the values in the row are valid numeric values, the mean is still coming out to be nan. enter image description here

The datatype is float64 for all the columns, but the mean is coming out to be nan for all the rows. (the last column in the image is the mean) I have excluded the first column (timestamp column) from the calculation. I also tried specifying skipna=True, but even that is not working.

The statement I have used is df[].mean(axis=0,skipna=True). Some of the rows have some nan values, but the sum of these row counts is only a tiny fraction of the total row count, and I would expect the skipna=True part to take care of the nans.

Any reasons why this could be happening and how to proceed?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
user34482
  • 21
  • 1
  • Row means should be `axis=1`, not 0. – tdy Feb 23 '23 at 05:13
  • It's coming out `nan` because you computed column means, so the index of that resulting series is the column names. Then when you assign that to a column, it tries to align those column names with the row indexes and can't do it, so it just gives all `nan`. – tdy Feb 23 '23 at 05:26
  • Related: [Why does `mean(axis=1)` compute row means instead of column means?](https://stackoverflow.com/q/22149584/13138364) – tdy Feb 23 '23 at 05:41

2 Answers2

0

you can use, its work for me:

df['mean'] = df.iloc[:, [1,2,3,4,5,6,7,8,9]].mean(axis=1)
  • Thanks I had entered the wrong axis for calculating the mean. Was able to do it without the iloc as well. – user34482 Feb 23 '23 at 16:55
0

A working demo is as follows:

import pandas as pd

df_example = pd.DataFrame({
    'DateTime': ['2018-09-15 12:00:00', '2018-09-15 13:00:00', '2018-09-15 14:00:00', '2018-09-15 15:00:00', '2018-09-15 16:00:00', '2018-09-15 17:00:00', '2018-09-15 18:00:00', '2018-09-15 19:00:00', '2018-09-15 20:00:00'],   
    'Num1': [1, 2, 3, 4, 5, 6, 7, 8, 9],
    'Num2': [6, 6, 6, 6, 6, 6, 6, 6, 6]
})

df_example['Mean'] = df_example[['Num1', 'Num2']].mean(axis=1)

print(df_example)

Output

              DateTime  Num1  Num2  Mean
0  2018-09-15 12:00:00     1     6   3.5
1  2018-09-15 13:00:00     2     6   4.0
2  2018-09-15 14:00:00     3     6   4.5
3  2018-09-15 15:00:00     4     6   5.0
4  2018-09-15 16:00:00     5     6   5.5
5  2018-09-15 17:00:00     6     6   6.0
6  2018-09-15 18:00:00     7     6   6.5
7  2018-09-15 19:00:00     8     6   7.0
8  2018-09-15 20:00:00     9     6   7.5
Brian.Z
  • 93
  • 6
  • Thanks - it was happening because I entered wrong axis. I used a similar approach to yours - created a list of columns to be averaged and stored it in a separate variable, and passed this to subset the df for calculating the mean. – user34482 Feb 23 '23 at 16:58