3

Let's say I have a Panda DataFrame like this

import pandas as pd


a=pd.Series([{'Country'='Italy','Name'='Augustina','Gender'='Female','Number'=1}])
b=pd.Series([{'Country'='Italy','Name'='Piero','Gender'='Male','Number'=2}])
c=pd.Series([{'Country'='Italy','Name'='Carla','Gender'='Female','Number'=3}])
d=pd.Series([{'Country'='Italy','Name'='Roma','Gender'='Female','Number'=4}])
e=pd.Series([{'Country'='Greece','Name'='Sophia','Gender'='Female','Number'=5}])
f=pd.Series([{'Country'='Greece','Name'='Zeus','Gender'='Male','Number'=6}])

df=pd.DataFrame([a,b,c,d,e,f])

then, I sort with multiindex, like

df.set_index(['Country','Gender'],inplace=True)

Now, I wold like to know how to count how many people are from Italy, or how many Greek female I have in the dataframe.

I've tried

df['Italy'].count()

and

 df['Greece']['Female'].count()

. None of them works,

Thanks

MatMorPau22
  • 346
  • 2
  • 3
  • 18
  • 2
    I'm assuming that your actual code doesn't have all the syntax errors, correct? – DeepSpace Feb 12 '17 at 12:10
  • Yes, sure, I make up the code, my problem is what I'm asking, but the data is diferent. I've just realised that the pd.Series have syntax error. – MatMorPau22 Feb 12 '17 at 18:08

1 Answers1

9

I think you need groupby with aggregatingsize:

What is the difference between size and count in pandas?

a=pd.DataFrame([{'Country':'Italy','Name':'Augustina','Gender':'Female','Number':1}])
b=pd.DataFrame([{'Country':'Italy','Name':'Piero','Gender':'Male','Number':2}])
c=pd.DataFrame([{'Country':'Italy','Name':'Carla','Gender':'Female','Number':3}])
d=pd.DataFrame([{'Country':'Italy','Name':'Roma','Gender':'Female','Number':4}])
e=pd.DataFrame([{'Country':'Greece','Name':'Sophia','Gender':'Female','Number':5}])
f=pd.DataFrame([{'Country':'Greece','Name':'Zeus','Gender':'Male','Number':6}])

df=pd.concat([a,b,c,d,e,f], ignore_index=True)
print (df)
  Country  Gender       Name  Number
0   Italy  Female  Augustina       1
1   Italy    Male      Piero       2
2   Italy  Female      Carla       3
3   Italy  Female       Roma       4
4  Greece  Female     Sophia       5
5  Greece    Male       Zeus       6

df = df.groupby('Country').size()
print (df)
Country
Greece    2
Italy     4
dtype: int64

df = df.groupby(['Country', 'Gender']).size()
print (df)
Country  Gender
Greece   Female    1
         Male      1
Italy    Female    3
         Male      1
dtype: int64

If need only some sizes with select by MultiIndex by xs or slicers:

df.set_index(['Country','Gender'],inplace=True)
print (df)
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Male        Piero       2
        Female      Carla       3
        Female       Roma       4
Greece  Female     Sophia       5
        Male         Zeus       6

print (df.xs('Italy', level='Country'))
             Name  Number
Gender                   
Female  Augustina       1
Male        Piero       2
Female      Carla       3
Female       Roma       4

print (len(df.xs('Italy', level='Country').index))
4

print (df.xs(('Greece', 'Female'), level=('Country', 'Gender')))
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.xs(('Greece', 'Female'), level=('Country', 'Gender')).index))
1

#KeyError: 'MultiIndex Slicing requires
#the index to be fully lexsorted tuple len (2), lexsort depth (0)'        
df.sort_index(inplace=True)
idx = pd.IndexSlice
print (df.loc[idx['Italy', :],:])
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Female      Carla       3
        Female       Roma       4
        Male        Piero       2

print (len(df.loc[idx['Italy', :],:].index))
4

print (df.loc[idx['Greece', 'Female'],:])
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.loc[idx['Greece', 'Female'],:].index))
1
Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252