1

With a dataframe with duplicated values for an index ix suxh as this:

   ix  value
0   1     12
1   2     14
2   2     15
3   2     12
4   3     10
5   4      9
6   5     14
7   5     21

How can you include a column that identifies and counts the number of duplicated values for ix?

Desired result:

   ix  value  dupes
0   1     12      1
1   2     14      1
2   2     15      2
3   2     12      3
4   3     10      1
5   4      9      1
6   5     14      1
7   5     21      2

I've tried different approaches with pd.merge, for example

pd.merge(df, df.groupby('ix').size().to_frame('dupes').reset_index(), on = 'ix', how = 'left')

But as you can see this will only get the total number of each duplicate:

   ix  value  dupes
0   1     12      1
1   2     14      3
2   2     15      3
3   2     12      3
4   3     10      1
5   4      9      1
6   5     14      2
7   5     21      2
vestland
  • 55,229
  • 37
  • 187
  • 305

1 Answers1

1

You can first groupby and then use cumcount

df['dupes'] = df.groupby('ix').cumcount().add(1)
Allen Qin
  • 19,507
  • 8
  • 51
  • 67