Groupby within groups

Question

I have data like this:

df = pd.DataFrame({
    'a': ['milk', 'eggs', 'eggs', 'butter', 'butter',
          'milk', 'eggs', 'eggs', 'butter', 'butter'],
    'b': ['billy', 'bob', 'frank', 'frank', 'sue',
          'frank', 'sue', 'sue', 'sue', 'sue'],
    'c': ['1/30', '1/30', '1/31', '1/31', '1/31',
          '3/31', '3/31', '3/31', '5/31', '5/31'],
}, index=list('ABCDEFGHIJ'))

I want the inverse of the counts for each distinct value of c in b. Billy and Bob each have one distinct value in c, so their counts are both equal to one. Frank has two dates, so his is 0.5, etc.

Desired output:

A    1.000000
B    1.000000
C    0.500000
D    0.500000
E    0.333333
F    0.500000
G    0.333333
H    0.333333
I    0.333333
J    0.333333
dtype: float64

I think I need to manipulate groupby(some group).count() and/or groupby(some group).transform('count'), but I'm not sure how to manipulate them and what else I need (if anything) - or if there's a better way.

I tried variations on

df.groupby(['b', 'c'], as_index=False)['c'].transform('count').reset_index()

(based on aggregating within a groupby), to no avail.

I could probably figure out an "ugly" way but I'd very much like to know how to do this in 1-2 lines (if possible).

Thanks!

ddejohn · Answer 1 · 2022-02-12T17:08:32.537

1

I'm sure there's a better way, I'm really unfamiliar with anything beyond the basics of Pandas, but this seems to do what you want:

df.merge(pd.DataFrame(1 / df.groupby("b")["c"].nunique()).reset_index(), on="b").set_index(df.index)

Output:

        a      b   c_x       c_y
A    milk  billy  1/30  1.000000
B    eggs    bob  1/30  1.000000
C    eggs  frank  1/31  0.500000
D  butter  frank  1/31  0.500000
E    milk  frank  3/31  0.500000
F  butter    sue  1/31  0.333333
G    eggs    sue  3/31  0.333333
H    eggs    sue  3/31  0.333333
I  butter    sue  5/31  0.333333
J  butter    sue  5/31  0.333333

edited Feb 12 '22 at 17:08

answered Feb 12 '22 at 17:03

ddejohn

8,775
3
17
30

This answer gets me close enough, but there is a problem: the merge sorts each column, so setting the index assigns the incorrect indices. See https://stackoverflow.com/questions/20206615/how-can-a-pandas-merge-preserve-order/28334396 – Chris Coffee Feb 12 '22 at 17:43

score 1 · Accepted Answer · answered Feb 13 '22 at 02:45

1

groupby.transform should suffice :

1/ df.groupby("b").c.transform("nunique")
A    1.000000
B    1.000000
C    0.500000
D    0.500000
E    0.333333
F    0.500000
G    0.333333
H    0.333333
I    0.333333
J    0.333333
Name: c, dtype: float64

answered Feb 13 '22 at 02:45

sammywemmy

27,093
4
17
31

Groupby within groups

2 Answers2