Create new column (pandas dataframe) when duplicate ids have a payment date

Question

I have a pandas dataframe:

pd.DataFrame({'id': [1, 1, 2, 2, 3, 3],
         'payment_count': 1, 2, 1, 2, 1, 2,
         'payment_date': ['2/2/2020', '4/6/2020', '3/20/2020', '3/29/2020', '5/1/2020', '5/30/2020']})

I want to take max('payment_count') by each 'id' and create a new column with the associated 'payment_date'. Desired output:

pd.DataFrame({'id': [1, 2, 3],
         'payment_date_1': ['2/2/2020', '3/20/2020', '5/1/2020'],
         'payment_date_2': ['4/6/2020', '3/29/2020', '5/30/2020']})

Yes, but I want to map that "payment_count'==2 to a new column somehow. Also I'm not trying to aggregate anything, so not sure how groupby helps here. — Michael Mathews Jr., Jul 30 '20 at 19:30
It is pivoting. You may try to use `df.pivot` and change column names with `add_prefix`. Read about pivoting dataframe: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe — Andy L., Jul 30 '20 at 19:48

MrNobody33 · Answer 1 · 2020-07-30T20:09:33.863

3

You can try with pivot, add_prefix, rename_axis and reset_index

df.pivot(index='id',columns='payment_count',values='payment_date_')\
   .rename_axis(None, axis = 1)\
   .add_prefix('payment_date')\
   .reset_index()

Output:

   id payment_date_1 payment_date_2
0   1      2/2/2020      4/6/2020
1   2     3/20/2020     3/29/2020
2   3      5/1/2020     5/30/2020

edited Jul 30 '20 at 20:09

answered Jul 30 '20 at 19:58

MrNobody33

6,413
7
19

score 1 · Accepted Answer · answered Jul 30 '20 at 20:02

1

Another way using groupby.

df['paydate'] = df.groupby('id')['payment_date'].cumcount()+1
df['paydate'] = 'payment_date' + df['paydate'].astype(str)
df = df.set_index(['paydate','id'])['payment_date']
df = df.unstack(0).rename_axis(None)

answered Jul 30 '20 at 20:02

rhug123

7,893
1
9
24

1

Awesome, this actually was the best option for me, as my real data had up to 50 'payment_count', so the pivot created 50 new columns which I didn't need. – Michael Mathews Jr. Jul 30 '20 at 20:12

score 0 · Answer 3 · answered Jul 30 '20 at 20:03

Ugly but it does what you asked. pivot sounds better though.

groups = df.groupby('id')
args = {group[0]:group[1].payment_count.argsort() for group in groups}

records = []
for k,v in args.items():
    payments = {f'payment_{i}':date
                for i,date in enumerate(df.payment_date[v])}
    payments['id'] = k
    records.append(payments)

_df = pd.DataFrame(records)

Create new column (pandas dataframe) when duplicate ids have a payment date

3 Answers3