0

Using this code:

import pandas

df = pandas.DataFrame()

df['id']     = list('aaabbbcccdddeee')
df['gender'] = list('mmfmfmmffmfmfff') 

I get a DataFrame like:

   id gender
0   a      m
1   a      m
2   a      f
3   b      m
4   b      f
5   b      m
6   c      m
7   c      f
8   c      f
9   d      m
10  d      f
11  d      m
12  e      f
13  e      f
14  e      f

How can I split df by id and then count the number of m and f in each category so that I can get a result that looks something like:

    m    f
a   2    1
b   2    1
c   1    2
d   2    1
e   0    3
jpp
  • 159,742
  • 34
  • 281
  • 339
user1367204
  • 4,549
  • 10
  • 49
  • 78
  • Easiest; `df.groupby(['id', 'gender']).gender.count().unstack(fill_value=0)`... if you need more performance, try `pivot` or `crosstab`. – cs95 May 18 '18 at 21:01
  • I feel like if it's a duplicate you should link to the duplicated version. Thanks for your answer. – user1367204 May 18 '18 at 21:06
  • I did, the linked answer is here https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – cs95 May 18 '18 at 21:13
  • Don't worry... it isn't a bad thing to close as duplicate. Also, you're welcome! – cs95 May 18 '18 at 21:14

1 Answers1

0

You can use pd.pivot_table with len:

res = df.pivot_table(index='id', columns='gender',
                     aggfunc=len, fill_value=0)

print(res)

gender  f  m
id          
a       1  2
b       1  2
c       2  1
d       1  2
e       3  0
jpp
  • 159,742
  • 34
  • 281
  • 339
  • I don't hold the belief that a difference in agg functions is a cause to answer the question despite it being a duplicate in every other respect _anyway_, but I'll defer to your better judgement because I respect your opinion. – cs95 May 18 '18 at 21:21
  • @coldspeed, It's an interesting question I raised [here](https://meta.stackoverflow.com/questions/365180/duplicates-in-the-context-of-data-manipulation). Unfortunately, no answers forthcoming :(. – jpp May 18 '18 at 22:04