0

I have a dataframe with 4 columns (Name, Gender, Number of births) and I want to calculate the total number of females and males and I have to make like a groupby gender. I've already done it with pandas like this :

names1880.groupby(['Gender']).sum()

And I had an ouput like this

Gender----|----Births
F         |    166868
----------|------------
M         |    120851

But now I have to do it by Numpy not with Pandas

Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85

3 Answers3

1

numpy.bincount is a good tool for this if you use the birth counts as weights. But bincount requires that the categories be integers. So you can do this in numpy if you first create an array with different integers for each gender, like this:

import pandas as pd
import numpy as np

names1880 = pd.DataFrame({
    'Name': ['Walter', 'Roger', 'Jane', 'Imelda'],
    'Gender': ['Male', 'Male', 'Female', 'Female'],
    'Births': [100, 200, 120, 220]
})

gender_names, gender_codes = np.unique(
    names1880['Gender'], return_inverse=True
)
print(gender_names)
print(np.bincount(gender_codes, weights=names1880['Births']))

# ['Female' 'Male']
# [340. 300.]
Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
  • Thanks but it doesn't work, I had this error : ValueError: could not convert string to float: 'F' . And I have to make a sum of variables not a count :'( – Sarah Kraiem Mar 14 '21 at 03:56
  • @SarahKraiem I added some code to categorize the genders first, so bincount can work with them. Note that bincount will calculate a sum of the birth counts if you use the birth counts as weights. – Matthias Fripp Mar 14 '21 at 06:45
0

Referencing this SO answer for converting pandas DataFrame to numpy structured array: https://stackoverflow.com/a/51280608/42346

import collections as co, numpy as np, pandas as pd

# my simple test DataFrame
df = pd.DataFrame(co.OrderedDict([('Name',['test','test2','test3']),
                                  ('Gender',['female','male','female']),
                                  ('Number of births',[2,3,4])])) 

# convert pandas DataFrame to numpy structured array 
# credit to this SO answer: https://stackoverflow.com/a/51280608/42346
sarr = np.array([tuple(x) for x in df.values], 
                dtype=list(zip(df.dtypes.index, df.dtypes)))

# finally get the totals by group
for k in np.unique(sarr['Gender']): 
    print(k, sum(sarr[sarr['Gender'] == k]['Number of births']))

Result:

female 6
male 3
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
0

Finally I did simply this : to count all births :

All_births = names1880.sum()

and to filter I did this :

names1880[names1880['Gender'] == "F"].sum()

and

names1880[names1880['Gender'] == "M"].sum()