0

I have a data frame that looks like this:

Contract_ID Agreement_Date
1 05/04/1997
1 03/02/1997
2 05/08/2020
2 08/07/2020
3 05/07/2020
3 09/08/2020

I would like to create a new Agreement_Date_x column populated by the Agreement_Date from the duplicated Contract_ID. In summary, I would like an output like this:

Contract_ID Agreement_Date Agreement_Date_x
1 05/04/1997 03/02/1997
2 05/08/2020 08/07/2020
3 05/07/2020 09/08/2020

This is not show-cased in my example, but some Contract_IDs have 3+ duplicates (all with different Agreement_Dates). I would like to create a separate column for each duplicate.

Thank you kindly

Mitchell
  • 55
  • 5

2 Answers2

0

IIUC, you can add a helper column and pivot:

(df
 .assign(col=df.groupby('Contract_ID').cumcount().add(1))
 .pivot(index='Contract_ID', columns='col', values='Agreement_Date')
 .add_prefix('Agreement_Date_')
 .reset_index().rename_axis(columns=None)
)

output:

   Contract_ID Agreement_Date_1 Agreement_Date_2
0            1       05/04/1997       03/02/1997
1            2       05/08/2020       08/07/2020
2            3       05/07/2020       09/08/2020
mozway
  • 194,879
  • 13
  • 39
  • 75
0
a = df.groupby('Contract_ID')['Agreement_Date']
df2 = pd.DataFrame(list(a.apply(list)))
df2['Contract_ID'] = a.groups.keys()
SiP
  • 1,080
  • 3
  • 8