How to create a duplicate flag (column) that counts duplicate rows based on two columns?

Question

I have the following dataframe and would like to create a column at the end called "dup" showing the number of times the row shows up based on the "Seasons" and "Actor" columns. Ideally the dup column would look like this:

               Name  Seasons        Actor   dup
0   Stranger Things        3       Millie     1
1   Game of Thrones        8       Emilia     1
2  La Casa De Papel        4       Sergio     1     
3         Westworld        3  Evan Rachel     1
4   Stranger Things        3       Millie     2
5  La Casa De Papel        4       Sergio     1

Do you want the duplicate row to be present in the dataframe? — NYC Coder, May 14 '20 at 20:16
Line at Index 5 correct, is that supposed to be 2 instead of 1? — Scott Boston, May 14 '20 at 20:17

Kurt Kline · Accepted Answer · 2020-05-14T21:02:19.910

This should do what you need:

df['dup'] = df.groupby(['Seasons', 'Actor']).cumcount() + 1

Output:

               Name  Seasons        Actor  dup
0   Stranger Things        3       Millie    1
1   Game of Thrones        8       Emilia    1
2  La Casa De Papel        4       Sergio    1
3         Westworld        3  Evan Rachel    1
4   Stranger Things        3       Millie    2
5  La Casa De Papel        4       Sergio    2

As Scott Boston mentioned, according to your criteria the last row should also be 2 in the dup column.

Here is a similar post that can provide you more information. SQL-like window functions in PANDAS

How to create a duplicate flag (column) that counts duplicate rows based on two columns?

1 Answers1