Row-wise average for a subset of columns with missing values

Question

I've got a 'DataFrame` which has occasional missing values, and looks something like this:

          Monday         Tuesday         Wednesday 
      ================================================
Mike        42             NaN               12
Jenna       NaN            NaN               15
Jon         21              4                 1

I'd like to add a new column to my data frame where I'd calculate the average across all columns for every row.

Meaning, for Mike, I'd need (df['Monday'] + df['Wednesday'])/2, but for Jenna, I'd simply use df['Wednesday amt.']/1

Does anyone know the best way to account for this variation that results from missing values and calculate the average?

Stefan · Accepted Answer · 2016-01-12T04:35:40.173

178

You can simply:

df['avg'] = df.mean(axis=1)

       Monday  Tuesday  Wednesday        avg
Mike       42      NaN         12  27.000000
Jenna     NaN      NaN         15  15.000000
Jon        21        4          1   8.666667

because .mean() ignores missing values by default: see docs.

To select a subset, you can:

df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)

       Monday  Tuesday  Wednesday   avg
Mike       42      NaN         12  42.0
Jenna     NaN      NaN         15   NaN
Jon        21        4          1  12.5

edited Jan 12 '16 at 04:35

answered Jan 12 '16 at 03:16

Stefan

41,759
13
76
81

That's great, thanks! Is there any way I can exclude a selection of the columns without creating a new data frame altogether, or would I have to create a new df out of a subset of df, run `df.mean(axis=1)`, and then merge that with the original data frame? – scrollex Jan 12 '16 at 04:31
@stallingOne https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – Stefan Jul 06 '22 at 19:13

score 13 · Answer 2 · answered Oct 09 '18 at 05:39

13

Alternative - using iloc (can also use loc here):

df['avg'] = df.iloc[:,0:2].mean(axis=1)

answered Oct 09 '18 at 05:39

Amir F

2,431
18
12

Larry Panozzo · Answer 3 · 2020-12-30T20:51:08.003

7

Resurrecting this Question because all previous answers currently print a Warning.

In most cases, use assign():

df = df.assign(avg=df.mean(axis=1))

For specific columns, one can input them by name:

df = df.assign(avg=df.loc[:, ["Monday", "Tuesday", "Wednesday"]].mean(axis=1))

Or by index, using one more than the last desired index as it is not inclusive:

df = df.assign(avg=df.iloc[:,0:3]].mean(axis=1))

edited Dec 30 '20 at 20:51

answered Dec 30 '20 at 20:09

Larry Panozzo

191
2
7

score -1 · Answer 4 · answered Apr 06 '23 at 00:27

-1

Using apply method:

df['avg'] = df[['Monday', 'Tuesday']].apply(np.avg, axis = 1)

answered Apr 06 '23 at 00:27

Gaurav Koradiya

338
4
8

Row-wise average for a subset of columns with missing values

4 Answers4

Linked

Related