pandas get column average/mean

Question

I can't get the average or mean of a column in pandas. A have a dataframe. Neither of things I tried below gives me the average of the column weight

>>> allDF 
         ID           birthyear  weight
0        619040       1962       0.1231231
1        600161       1963       0.981742
2      25602033       1963       1.3123124     
3        624870       1987       0.94212

The following returns several values, not one:

allDF[['weight']].mean(axis=1)

So does this:

allDF.groupby('weight').mean()

`df.groupby('weight')` wasn't what you wanted, because it split the df into separate columns, each with a distinct value of weight. Instead of just `df['weight'].mean()` — smci, Feb 16 '18 at 08:41

score 424 · Accepted Answer · edited Jan 06 '23 at 03:32

424

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]: 
         ID  birthyear    weight
0    619040       1962  0.123123
1    600161       1963  0.981742
2  25602033       1963  1.312312
3    624870       1987  0.942120

In [480]: df.loc[:, 'weight'].mean()
Out[480]: 0.83982437500000007

edited Jan 06 '23 at 03:32

questionto42

7,175
4
57
90

answered Jun 24 '15 at 21:26

DSM

342,061
65
592
494

6

and what if I wanted to get a mean of each and every column? – Chris Jun 11 '18 at 14:55
6

@Chris df.describe() – Abhishek Poojary Aug 01 '18 at 17:20
5

@Chris df.mean() gives you the weight of each column and returns it in a series. – emschorsch Feb 22 '19 at 00:41

score 42 · Answer 2 · edited Mar 07 '19 at 16:35

42

Try df.mean(axis=0) , axis=0 argument calculates the column wise mean of the dataframe so the result will be axis=1 is row wise mean so you are getting multiple values.

edited Mar 07 '19 at 16:35

Soufiane S

197
1
4
16

answered Aug 08 '18 at 16:38

Chandu

451
5
7

This works for most columns, but it will ignore any datetime columns. – user74696c Jul 19 '21 at 07:16

score 23 · Answer 3 · answered May 08 '18 at 06:14

23

Do try to give print (df.describe()) a shot. I hope it will be very helpful to get an overall description of your dataframe.

answered May 08 '18 at 06:14

nainometer

413
3
17

5

`display(df.describe())` is better (in Jupyter Notebooks) because `display` from ipython provides formatted HTML rather than ASCII, which is more visually useful/pleasing. – Zhanwen Chen Apr 05 '19 at 16:28

score 17 · Answer 4 · answered Nov 25 '19 at 16:31

17

Mean for each column in df :

    A   B   C
0   5   3   8
1   5   3   9
2   8   4   9

df.mean()

A    6.000000
B    3.333333
C    8.666667
dtype: float64

and if you want average of all columns:

df.stack().mean()
6.0

answered Nov 25 '19 at 16:31

Hrvoje

13,566
7
90
104

score 14 · Answer 5 · answered Nov 28 '18 at 15:41

14

you can use

df.describe()

you will get basic statistics of the dataframe and to get mean of specific column you can use

df["columnname"].mean()

answered Nov 28 '18 at 15:41

Arun Singh

169
1
5

3

This is a duplicate of the answers mentioned above. – Mehdi Boukhechba Dec 12 '18 at 14:29

score 10 · Answer 6 · answered Jul 16 '19 at 22:53

10

You can also access a column using the dot notation (also called attribute access) and then calculate its mean:

df.your_column_name.mean()

answered Jul 16 '19 at 22:53

Nikos Tavoularis

2,843
1
30
27

Take `df.loc[:, 'your_column_name']` whenever you can. – questionto42 Jan 06 '23 at 03:27
@questionto42 why is that? why use df.loc[:, 'weight'].mean() instead of df['weight'].mean()? – sparktime12 Feb 28 '23 at 10:15
1

@sparktime12 Both of the styles that you write here do the same, at the same speed. I just read (and vaguely remember) it as best practice since `loc` came up later and is standardised for anything you would like to query while the shortcut blurs the view on how the filter of a df works. Check [What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?](https://stackoverflow.com/a/48411543/11154841). – questionto42 Feb 28 '23 at 16:43

score 6 · Answer 7 · edited May 28 '20 at 23:33

6

You can use either of the two statements below:

numpy.mean(df['col_name'])
# or
df['col_name'].mean()

edited May 28 '20 at 23:33

oo00oo00oo00

473
6
16

answered Nov 26 '19 at 10:04

davidbilla

2,120
1
15
26

Please, enrich your answer with proper comments. Otherwise it is likely to be marked for deletion – Don Nov 26 '19 at 11:14

score 3 · Answer 8 · answered May 26 '20 at 06:49

Additionally if you want to get the round value after finding the mean.

#Create a DataFrame
df1 = {
    'Subject':['semester1','semester2','semester3','semester4','semester1',
               'semester2','semester3'],
   'Score':[62.73,47.76,55.61,74.67,31.55,77.31,85.47]}
df1 = pd.DataFrame(df1,columns=['Subject','Score'])

rounded_mean = round(df1['Score'].mean()) # specified nothing as decimal place
print(rounded_mean) # 62

rounded_mean_decimal_0 = round(df1['Score'].mean(), 0) # specified decimal place as 0
print(rounded_mean_decimal_0) # 62.0

rounded_mean_decimal_1 = round(df1['Score'].mean(), 1) # specified decimal place as 1
print(rounded_mean_decimal_1) # 62.2

score 3 · Answer 9 · answered Nov 27 '20 at 11:16

Do note that it needs to be in the numeric data type in the first place.

 import pandas as pd
 df['column'] = pd.to_numeric(df['column'], errors='coerce')

Next find the mean on one column or for all numeric columns using describe().

df['column'].mean()
df.describe()

Example of result from describe:

          column 
count    62.000000 
mean     84.678548 
std     216.694615 
min      13.100000 
25%      27.012500 
50%      41.220000 
75%      70.817500 
max    1666.860000

Take df.loc[:, 'your_column_name'] whenever you can. – questionto42 Jan 06 '23 at 03:30 — questionto42, Jan 06 '23 at 03:30

score 2 · Answer 10 · answered Jun 12 '20 at 12:03

You can simply go for: df.describe() that will provide you with all the relevant details you need, but to find the min, max or average value of a particular column (say 'weights' in your case), use:

    df['weights'].mean(): For average value
    df['weights'].max(): For maximum value
    df['weights'].min(): For minimum value

score 2 · Answer 11 · answered Oct 03 '22 at 20:15

2

You can use the method agg (aggregate):

df.agg('mean')

It's possible to apply multiple statistics:

df.agg(['mean', 'max', 'min'])

answered Oct 03 '22 at 20:15

Mykola Zotko

15,583
3
71
73

score -2 · Answer 12 · edited Feb 04 '21 at 08:17

You can easily follow the following code

import pandas as pd 
import numpy as np 
        
classxii = {'Name':['Karan','Ishan','Aditya','Anant','Ronit'],
            'Subject':['Accounts','Economics','Accounts','Economics','Accounts'],
            'Score':[87,64,58,74,87],
            'Grade':['A1','B2','C1','B1','A2']}

df = pd.DataFrame(classxii,index = ['a','b','c','d','e'],columns=['Name','Subject','Score','Grade'])
print(df)

#use the below for mean if you already have a dataframe
print('mean of score is:')
print(df[['Score']].mean())

pandas get column average/mean

12 Answers12

Linked

Related