0

Summarize the Problem:

Basically trying to replicate the Maxifs function from Excel to Pandas based on unique ID. The raw data is a running total on payments based on unique ID and payment dates. The desired output would be the reverse of running total to see single payment on each payment date by unique ID

Below is the raw data from Excel -

ID Months_after_origination Principal_payment Origination_date
123 0 48.32 10/1/2021
123 1 75.65 10/1/2021
123 2 234.67 10/1/2021
123 3 380.37 10/1/2021

Below is the desired output -

ID Months_after_origination Principal_payment Origination_date Principal_single
123 0 48.32 10/1/2021 48.32
123 1 75.65 10/1/2021 27.33
123 2 234.67 10/1/2021 159.02
123 3 380.37 10/1/2021 145.70

What I've tried so far:

Been following the links below and trying to replicate but seems to need more work on where chain a bit more and hopefully could receive some feedback here. Link1 Link2

Some codes:

a_df["Principal_single"] = a_df["Principal_payment"].where(a_df["Months_after_origination"] < a_df["Months_after_origination"].values).groupby(a_df["ID"]).transform("max")

It turns out all NaN in the Principal_single field and suspect that where chain doesn't capture the condition. Thanks for the help here.

liamsuma
  • 156
  • 4
  • 19

1 Answers1

1

Here is one way to do it


# group by the ID, then subtract previous value from current value, using shift
df['principal_single'] = (df.groupby(['ID'])[['Principal_payment']]
                            .transform(lambda x: (x - x.shift(1)) ))

# fill the NaN value with the principal payment using ffill
df[['Principal_payment','principal_single']]= df[['Principal_payment','principal_single']].ffill(axis=1 )


df

    ID  Months_after_origination    Principal_payment   Origination_date    principal_single
0   123                        0                48.32   10/1/2021           48.32
1   123                        1                75.65   10/1/2021           27.33
2   123                        2               234.67   10/1/2021          159.02
3   123                        3               380.37   10/1/2021          145.70

Naveed
  • 11,495
  • 2
  • 14
  • 21
  • Thanks for the comments and appreciate the help. After testing the code, I see a length mismatch error just through the first line of code. – liamsuma Oct 04 '22 at 13:38
  • did it works on the data you shared in question? did you copy/paste the code? – Naveed Oct 04 '22 at 13:42
  • Thanks for the response. The shared data is only an example whereas the entire data has multiple rows by different IDs. The value error is `length mismatch: expected axis has 12105990 elements, new values have 12106005 elements` – liamsuma Oct 04 '22 at 13:55
  • what is version of pandas you're using? – Naveed Oct 04 '22 at 14:09
  • Confirmed no duplicated columns. After removing the `groupby`, it seems working when calculating the single payment sequentially. This is the code after removing `groupby` - `df["principal_single"]=df[['Principal_payment']].transform(lambda x: (x - x.shift(1)) )` – liamsuma Oct 04 '22 at 14:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/248542/discussion-between-naveed-and-liamsuma). – Naveed Oct 04 '22 at 14:11