0

Is it possible to compare values on the same dataframe and set a new column to count the repetitions with pandas?
From:

id | column1 | column2
0  | a       | a
1  | b       | b
2  | a       | c
3  | c       | c
4  | b       | b

To:

id | column1 | column2 | count
0  | a       | a       | 1
1  | b       | b       | 2
2  | a       | c       | 0
3  | c       | c       | 1
4  | b       | b       | 2
null92
  • 145
  • 8

2 Answers2

0

You can use filter to get the columns of interest, identify the rows with all identical values and use it to create a custom group for a groupby.transform('size'):

# filter columns in "column"
df2 = df.filter(like='column')
# keep rows with identical columns
s = df2.iloc[:, 0].where(df2.eq(s, axis=0).all(axis=1))
# groupby unique value and get the count
df['count'] = df.groupby(s).transform('size').fillna(0, downcast='infer')

NB. this is working with an arbitrary number of columns

If you really have only two columns, you can simplify to:

m = df['column1'].eq(df['column2'])
df['count'] = (m.groupby(df['column1'].where(m))
                .transform('size').fillna(0, downcast='infer')
              )

Output:

   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Compare both columns to helper m column and use GroupBy.transform with sum:

df['count'] = (df.assign(m=df.column1.eq(df.column2))
                 .groupby(['column1', 'column2'])['m']
                 .transform('sum'))
print (df)
   id column1 column2  count
0   0       a       a      1
1   1       b       b      2
2   2       a       c      0
3   3       c       c      1
4   4       b       b      2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252