Creating a new column based on duplicates from another column

Question

I have a data frame that looks like this:

Contract_ID	Agreement_Date
1	05/04/1997
1	03/02/1997
2	05/08/2020
2	08/07/2020
3	05/07/2020
3	09/08/2020

I would like to create a new Agreement_Date_x column populated by the Agreement_Date from the duplicated Contract_ID. In summary, I would like an output like this:

Contract_ID	Agreement_Date	Agreement_Date_x
1	05/04/1997	03/02/1997
2	05/08/2020	08/07/2020
3	05/07/2020	09/08/2020

This is not show-cased in my example, but some Contract_IDs have 3+ duplicates (all with different Agreement_Dates). I would like to create a separate column for each duplicate.

Thank you kindly

score 0 · Answer 1 · answered Mar 14 '22 at 14:43

IIUC, you can add a helper column and pivot:

(df
 .assign(col=df.groupby('Contract_ID').cumcount().add(1))
 .pivot(index='Contract_ID', columns='col', values='Agreement_Date')
 .add_prefix('Agreement_Date_')
 .reset_index().rename_axis(columns=None)
)

output:

   Contract_ID Agreement_Date_1 Agreement_Date_2
0            1       05/04/1997       03/02/1997
1            2       05/08/2020       08/07/2020
2            3       05/07/2020       09/08/2020

score 0 · Answer 2 · answered Mar 14 '22 at 14:47

0

a = df.groupby('Contract_ID')['Agreement_Date']
df2 = pd.DataFrame(list(a.apply(list)))
df2['Contract_ID'] = a.groups.keys()

answered Mar 14 '22 at 14:47

SiP

1,080
3
8

Creating a new column based on duplicates from another column

2 Answers2