Add column of value_counts based on multiple columns

Question

So basically this questios in equal to either 17709270 or 29791785, with the difference that I want to count values based on multiple columns instead of just one.

I have for example

    column1 column2 column3 column4 
1   a       2       True    asdmn
2   b       2       False   asdd
3   c       3       False   asddas
4   a       2       False   grtgv
5   b       1       False   bdfbf

and my result should be

    column1 column2 column3 column4  counts
1   a       2       True    asdmn    2
2   b       2       False   asdd     1
3   a       3       False   asddas   1
4   a       2       False   grtgv    2
5   b       1       False   bdfbf    1

If I am not mistaken none of the respones of previous referenced questions work for this case.

score 2 · Accepted Answer · answered Feb 20 '23 at 16:04

2

You can do:

df['counts'] = df.groupby(['column1','column2'])['column1'].transform('size')

answered Feb 20 '23 at 16:04

Quang Hoang

146,074
10
56
74

score 1 · Answer 2 · answered Feb 20 '23 at 16:43

With value_counts, you could do:

cols = ["column1", "column2"]
df = df.assign(counts=df.set_index(cols).index.map(df[cols].value_counts()))

print(df)

  column1  column2  column3 column4 counts
1       a        2     True   asdmn      2
2       b        2    False    asdd      1
3       c        3    False  asddas      1
4       a        2    False   grtgv      2
5       b        1    False   bdfbf      1

score 0 · Answer 3 · answered Feb 20 '23 at 16:02

So first was to do a value count on several columns for what responses in here were useful. Basically I tried two options that worked for me:

counts = df.value_counts(['column1', 'column2']).reset_index(name='counts')

Then I used merged as in here:

counts = df.value_counts(['column1', 'column2']).reset_index(name='counts')
duplicated_sats.merge(counts, on=['column1', 'column2'])

Add column of value_counts based on multiple columns

3 Answers3