0

So basically this questios in equal to either 17709270 or 29791785, with the difference that I want to count values based on multiple columns instead of just one.

I have for example

    column1 column2 column3 column4 
1   a       2       True    asdmn
2   b       2       False   asdd
3   c       3       False   asddas
4   a       2       False   grtgv
5   b       1       False   bdfbf

and my result should be

    column1 column2 column3 column4  counts
1   a       2       True    asdmn    2
2   b       2       False   asdd     1
3   a       3       False   asddas   1
4   a       2       False   grtgv    2
5   b       1       False   bdfbf    1

If I am not mistaken none of the respones of previous referenced questions work for this case.

J Agustin Barrachina
  • 3,501
  • 1
  • 32
  • 52

3 Answers3

2

You can do:

df['counts'] = df.groupby(['column1','column2'])['column1'].transform('size')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

With value_counts, you could do:

cols = ["column1", "column2"]
df = df.assign(counts=df.set_index(cols).index.map(df[cols].value_counts()))

print(df)

  column1  column2  column3 column4 counts
1       a        2     True   asdmn      2
2       b        2    False    asdd      1
3       c        3    False  asddas      1
4       a        2    False   grtgv      2
5       b        1    False   bdfbf      1
SomeDude
  • 13,876
  • 5
  • 21
  • 44
0

So first was to do a value count on several columns for what responses in here were useful. Basically I tried two options that worked for me:

counts = df.value_counts(['column1', 'column2']).reset_index(name='counts')

Then I used merged as in here:

counts = df.value_counts(['column1', 'column2']).reset_index(name='counts')
duplicated_sats.merge(counts, on=['column1', 'column2'])
J Agustin Barrachina
  • 3,501
  • 1
  • 32
  • 52