df.mean() giving nan values even for non-nan input

Question

I am calculating the mean of each row in a data frame, but even when all the values in the row are valid numeric values, the mean is still coming out to be nan.

The datatype is float64 for all the columns, but the mean is coming out to be nan for all the rows. (the last column in the image is the mean) I have excluded the first column (timestamp column) from the calculation. I also tried specifying skipna=True, but even that is not working.

The statement I have used is df[].mean(axis=0,skipna=True). Some of the rows have some nan values, but the sum of these row counts is only a tiny fraction of the total row count, and I would expect the skipna=True part to take care of the nans.

Any reasons why this could be happening and how to proceed?

It's coming out `nan` because you computed column means, so the index of that resulting series is the column names. Then when you assign that to a column, it tries to align those column names with the row indexes and can't do it, so it just gives all `nan`. — tdy, Feb 23 '23 at 05:26
Related: [Why does `mean(axis=1)` compute row means instead of column means?](https://stackoverflow.com/q/22149584/13138364) — tdy, Feb 23 '23 at 05:41

Tornike Kharitonishvili · Answer 1 · 2023-02-23T07:07:18.397

0

you can use, its work for me:

df['mean'] = df.iloc[:, [1,2,3,4,5,6,7,8,9]].mean(axis=1)

edited Feb 23 '23 at 07:07

answered Feb 23 '23 at 07:02

Tornike Kharitonishvili

457
4
9

Thanks I had entered the wrong axis for calculating the mean. Was able to do it without the iloc as well. – user34482 Feb 23 '23 at 16:55

score 0 · Answer 2 · answered Feb 23 '23 at 08:42

A working demo is as follows:

import pandas as pd

df_example = pd.DataFrame({
    'DateTime': ['2018-09-15 12:00:00', '2018-09-15 13:00:00', '2018-09-15 14:00:00', '2018-09-15 15:00:00', '2018-09-15 16:00:00', '2018-09-15 17:00:00', '2018-09-15 18:00:00', '2018-09-15 19:00:00', '2018-09-15 20:00:00'],   
    'Num1': [1, 2, 3, 4, 5, 6, 7, 8, 9],
    'Num2': [6, 6, 6, 6, 6, 6, 6, 6, 6]
})

df_example['Mean'] = df_example[['Num1', 'Num2']].mean(axis=1)

print(df_example)

Output

              DateTime  Num1  Num2  Mean
0  2018-09-15 12:00:00     1     6   3.5
1  2018-09-15 13:00:00     2     6   4.0
2  2018-09-15 14:00:00     3     6   4.5
3  2018-09-15 15:00:00     4     6   5.0
4  2018-09-15 16:00:00     5     6   5.5
5  2018-09-15 17:00:00     6     6   6.0
6  2018-09-15 18:00:00     7     6   6.5
7  2018-09-15 19:00:00     8     6   7.0
8  2018-09-15 20:00:00     9     6   7.5

Thanks - it was happening because I entered wrong axis. I used a similar approach to yours - created a list of columns to be averaged and stored it in a separate variable, and passed this to subset the df for calculating the mean. — user34482, Feb 23 '23 at 16:58

df.mean() giving nan values even for non-nan input

2 Answers2