Pandas - compare and count repetitions

Question

Is it possible to compare values on the same dataframe and set a new column to count the repetitions with pandas?
From:

id | column1 | column2
0  | a       | a
1  | b       | b
2  | a       | c
3  | c       | c
4  | b       | b

To:

id | column1 | column2 | count
0  | a       | a       | 1
1  | b       | b       | 2
2  | a       | c       | 0
3  | c       | c       | 1
4  | b       | b       | 2

the solution you chose is incorrect (it gives 1 for a/c) – mozway Dec 05 '22 at 13:46 — mozway, Dec 05 '22 at 13:46
Do you need https://stackoverflow.com/q/17995024 ? – jezrael Dec 05 '22 at 13:48 — jezrael, Dec 05 '22 at 13:48

mozway · Answer 1 · 2022-12-05T13:39:51.363

You can use filter to get the columns of interest, identify the rows with all identical values and use it to create a custom group for a groupby.transform('size'):

# filter columns in "column"
df2 = df.filter(like='column')
# keep rows with identical columns
s = df2.iloc[:, 0].where(df2.eq(s, axis=0).all(axis=1))
# groupby unique value and get the count
df['count'] = df.groupby(s).transform('size').fillna(0, downcast='infer')

NB. this is working with an arbitrary number of columns

If you really have only two columns, you can simplify to:

m = df['column1'].eq(df['column2'])
df['count'] = (m.groupby(df['column1'].where(m))
                .transform('size').fillna(0, downcast='infer')
              )

Output:

   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2

jezrael · Answer 2 · 2022-12-05T13:46:14.807

0

Compare both columns to helper m column and use GroupBy.transform with sum:

df['count'] = (df.assign(m=df.column1.eq(df.column2))
                 .groupby(['column1', 'column2'])['m']
                 .transform('sum'))
print (df)
   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2

edited Dec 05 '22 at 13:46

answered Dec 05 '22 at 13:35

jezrael

822,522
95
1,334
1,252

Pandas - compare and count repetitions

2 Answers2