1

I want do a mean of my variable in a DataFrame where I have grouped the element by column 'A'. The problem is that when I print the result the output is the mean only for the first variable, column, why do this ?

Code;

import pandas as pd

file = open('C:/Users/Andre/Desktop/Python/introduction-datascience-python-book-master/files/ch03/adult.data', 'r')

def chr_int(a):
    if a.isdigit(): return int(a)
    else:
        return a

data = []
for line in file:
    data1 = line.split(',')  
    if len(data1) == 15: 
        data.append([chr_int(data1[0]), data1[1], chr_int(data1[2]), data1[3], chr_int(data1[4]), data1[5], data1[6],
                        data1[7], data1[8], data1[9], chr_int(data1[10]), chr_int(data1[11]),
                        chr_int(data1[12]), data1[13], data1[14]])

df = pd.DataFrame(data)
df.columns = [ 'age', 'type_employer', 'fnlwgt', 'education',
                'education_num', 'marital', 'occupation',
                'relationship', 'race', 'sex', 'capital_gain',
                'capital_loss', 'hr_per_week', 'country', 'income' ]

#print(df) 

counts = df.groupby('country').mean()  
print(counts.head())    

OUTPUT;

                 age
country             
 ?         38.725557
 Cambodia  37.789474
 Canada    42.545455
 China     42.533333
 Columbia  39.711864
Andrea
  • 39
  • 8
  • 2
    Can you provide some sample data that reproduces your problem (See [how to make a pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)). `mean` only aggregates numeric values, so if there's only one numeric column that's all that will appear in the output... – ALollz May 07 '20 at 02:40
  • https://stackoverflow.com/questions/40066837/pandas-get-average-of-a-groupby – Joe May 07 '20 at 04:51
  • https://stackoverflow.com/questions/41040132/pandas-groupby-count-and-mean-combined – Joe May 07 '20 at 04:51

1 Answers1

0

Something Intuitive about pandas.DataFrame.groupby()

Try working it out this way : Gives a neat presentable code as well as answer

Use : df.groupby('column_to_group').agg({'col_to_mean':'mean' ,'col_to_sum':'sum'})

For multiple column based Groupby's convert the single 'column_to_group' to a list of different columns to group,

Example df.groupby(['group_col_1','group_col_2']).agg({'col_to_mean':'mean' ,'col_to_sum':'sum'})

Make Sure you don't use the same columns to Group as well as Aggregate Cheers !

PS : For a selective data type groupby use, df.select_dtypes() , paramater to which would : 'include' or 'exclude' based on your requirements

Example : df.select_dtypes(include=['int64'])groupby(['group_col_1','group_col_2']).agg({'col_to_mean':'mean' ,'col_to_sum':'sum'})

  • Thank you, I find some values that was not integer, so I have converted this values and now the program run correctly. – Andrea May 07 '20 at 06:56