how to count the sum satisfying a condition with Numpy not with Pandas

Question

I have a dataframe with 4 columns (Name, Gender, Number of births) and I want to calculate the total number of females and males and I have to make like a groupby gender. I've already done it with pandas like this :

names1880.groupby(['Gender']).sum()

And I had an ouput like this

Gender----|----Births
F         |    166868
----------|------------
M         |    120851

But now I have to do it by Numpy not with Pandas

Also see if this helps https://stackoverflow.com/questions/4373631/sum-array-by-number-in-numpy — Joe Ferndz, Mar 14 '21 at 04:13

Matthias Fripp · Answer 1 · 2021-03-14T07:40:07.470

1

numpy.bincount is a good tool for this if you use the birth counts as weights. But bincount requires that the categories be integers. So you can do this in numpy if you first create an array with different integers for each gender, like this:

import pandas as pd
import numpy as np

names1880 = pd.DataFrame({
    'Name': ['Walter', 'Roger', 'Jane', 'Imelda'],
    'Gender': ['Male', 'Male', 'Female', 'Female'],
    'Births': [100, 200, 120, 220]
})

gender_names, gender_codes = np.unique(
    names1880['Gender'], return_inverse=True
)
print(gender_names)
print(np.bincount(gender_codes, weights=names1880['Births']))

# ['Female' 'Male']
# [340. 300.]

edited Mar 14 '21 at 07:40

answered Mar 14 '21 at 03:43

Matthias Fripp

17,670
5
28
45

Thanks but it doesn't work, I had this error : ValueError: could not convert string to float: 'F' . And I have to make a sum of variables not a count :'( – Sarah Kraiem Mar 14 '21 at 03:56
@SarahKraiem I added some code to categorize the genders first, so bincount can work with them. Note that bincount will calculate a sum of the birth counts if you use the birth counts as weights. – Matthias Fripp Mar 14 '21 at 06:45

score 0 · Answer 2 · answered Mar 14 '21 at 04:18

Referencing this SO answer for converting pandas DataFrame to numpy structured array: https://stackoverflow.com/a/51280608/42346

import collections as co, numpy as np, pandas as pd

# my simple test DataFrame
df = pd.DataFrame(co.OrderedDict([('Name',['test','test2','test3']),
                                  ('Gender',['female','male','female']),
                                  ('Number of births',[2,3,4])])) 

# convert pandas DataFrame to numpy structured array 
# credit to this SO answer: https://stackoverflow.com/a/51280608/42346
sarr = np.array([tuple(x) for x in df.values], 
                dtype=list(zip(df.dtypes.index, df.dtypes)))

# finally get the totals by group
for k in np.unique(sarr['Gender']): 
    print(k, sum(sarr[sarr['Gender'] == k]['Number of births']))

Result:

female 6
male 3

score 0 · Answer 3 · answered Mar 14 '21 at 04:42

0

Finally I did simply this : to count all births :

All_births = names1880.sum()

and to filter I did this :

names1880[names1880['Gender'] == "F"].sum()

and

names1880[names1880['Gender'] == "M"].sum()

answered Mar 14 '21 at 04:42

Sarah Kraiem

9
2

how to count the sum satisfying a condition with Numpy not with Pandas

3 Answers3