How to add an index that identifies and counts duplicated values?

Question

With a dataframe with duplicated values for an index ix suxh as this:

   ix  value
0   1     12
1   2     14
2   2     15
3   2     12
4   3     10
5   4      9
6   5     14
7   5     21

How can you include a column that identifies and counts the number of duplicated values for ix?

Desired result:

   ix  value  dupes
0   1     12      1
1   2     14      1
2   2     15      2
3   2     12      3
4   3     10      1
5   4      9      1
6   5     14      1
7   5     21      2

I've tried different approaches with pd.merge, for example

pd.merge(df, df.groupby('ix').size().to_frame('dupes').reset_index(), on = 'ix', how = 'left')

But as you can see this will only get the total number of each duplicate:

   ix  value  dupes
0   1     12      1
1   2     14      3
2   2     15      3
3   2     12      3
4   3     10      1
5   4      9      1
6   5     14      2
7   5     21      2

`df.groupby('ix').cumcount() + 1` – U13-Forward Sep 09 '21 at 11:36 — U13-Forward, Sep 09 '21 at 11:36
```df['dupes'] = df.groupby('ix').cumcount()+1 ``` – sophocles Sep 09 '21 at 11:36 — sophocles, Sep 09 '21 at 11:36

score 1 · Accepted Answer · answered Sep 09 '21 at 11:36

1

You can first groupby and then use cumcount

df['dupes'] = df.groupby('ix').cumcount().add(1)

answered Sep 09 '21 at 11:36

Allen Qin

19,507
8
51
67

How to add an index that identifies and counts duplicated values?

Desired result:

1 Answers1