0

Imaging there is a dataframe with a lot of missing transaction_total and balance_total and date

id,date,transaction_total,balance_total
1,01/01/2019,-1,102
1,01/02/2019,-2,100
1,01/03/2019,-3,
1,01/04/2019,,
1,01/05/2019,-4,
2,01/01/2019,-2,200
2,01/02/2019,-2,100
2,01/04/2019,,
2,01/05/2019,-4,

Here is the create-input script:

import pandas as pd
import numpy as np

users=pd.DataFrame(
                [
                {'id':1,'date':'01/01/2019', 'transaction_total':-1, 'balance_total':102},
                {'id':1,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
                {'id':1,'date':'01/03/2019', 'transaction_total':-3, 'balance_total':''},
                {'id':1,'date':'01/04/2019', 'transaction_total':'', 'balance_total':''},
                {'id':1,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':''},
                {'id':2,'date':'01/01/2019', 'transaction_total':-2, 'balance_total':200},
                {'id':2,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
                {'id':2,'date':'01/04/2019', 'transaction_total':'', 'balance_total':''},
                {'id':2,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':''}  
                ]
                )

the goal is to achieve the following:

desired FINAL output:

id,date,balance_total
1,01/01/2019,102
1,01/02/2019,100
1,01/03/2019,97
1,01/04/2019,97
1,01/05/2019,93
2,01/01/2019,200
2,01/02/2019,100
2,01/03/2019,97
2,01/04/2019,97
2,01/05/2019,93

(1) if the date is missing, fill the date with the balance from the previous date (i think the reindex solution in this link might work Pandas filling missing dates and values within group )

(2)if the balance_total is missing while there are valid 'date' and 'transaction_total', fill the "balance total" with "the previous date's balance_total-the transaction_total on the date when the balance_total is missing" (the case in row 3: 100+ (-3)=97)

(3) if there is a valid date, but both of the transaction_total and balance_total are NaN, just fill in the last date's balance_total (e.g. row 4: since the total_balance in 01/03/2019 will be 97 based on the previous calculation, the 01/04/2019 balance will be 97 because there is no transaction_total.)

desired metadata output:

id,date,transaction_total,balance_total
1,01/01/2019,-1,102
1,01/02/2019,-2,100
1,01/03/2019,-3,97
1,01/04/2019,0,97
1,01/05/2019,-4,93
2,01/01/2019,-2,200
2,01/02/2019,-2,100
2,01/03/2019,-3,97
2,01/04/2019,,97
2,01/05/2019,-4,93
Chubaka
  • 2,933
  • 7
  • 43
  • 58

0 Answers0