2

I have the following dataframe.

         Date    Status   Amount  Balance
0  06-10-2000   Deposit    40.00     40.0
1  09-12-2002  Withdraw  1000.00      NaN
2  27-06-2001   Deposit    47.00      NaN
3  07-12-2021  Withdraw   100.00      NaN
4  06-10-2022   Deposit   120.00      NaN
5  06-10-2000   Deposit    40.00      NaN
6  09-12-2024  Withdraw    50.00      NaN

The goal is to update the Balance based on whether it is deposit or withdrawal, Initial balance is = the starting amount. Hence hard coded it as 40.0.

Below is my code, somehow I am not getting the expected result.

Expected result:

         Date    Status   Amount  Balance
0  06-10-2000   Deposit    40.00     40.0
1  09-12-2002  Withdraw  1000.00    -960.0
2  27-06-2001   Deposit    47.00    -913.0
3  07-12-2021  Withdraw   100.00    -1013.0
4  06-10-2022   Deposit   120.00    -893.0
5  06-10-2000   Deposit    40.00    -853.0
6  09-12-2024  Withdraw    50.00    -903.0

What am i doing wrong in the code, the code is below

import pandas as pd
with open(r"transactions.txt", "r") as Account:
    details = Account.read().split(",")
print("details of txt",details)

df=pd.DataFrame(details)

fg=df[0].str.extract('(?P<Date>.*) (?P<Status>.*) (?P<Amount>.*)')
print(fg)

fg['Amount'] = fg.Amount.str.replace('$','') #removing $ sign
#setting first row value of balance as 40, as equal to amount in 1st row
fg.loc[fg.index[0], 'Balance'] = 40.00 
print(fg)

for index, row in fg.iterrows():
    if index==0:
        continue
    if fg.loc[index,'Status']=='Deposit':
        print("reached here")
        fg.at[float(index),'Balance']=sum(fg.loc[float(index),'Amount'],fg.loc[float(index-1),'Balance'])
    elif fg.loc[index,'Status']=='withdraw':  
        fg.at[float(index),'Balance']=fg.loc[float(index),'Amount']-fg.loc[float(index-1),'Balance']

    print(fg)
user10083444
  • 105
  • 1
  • 1
  • 10

1 Answers1

1

IIUC, np.where and cumsum

df['Balance'] = np.where(df['Status'].eq('Deposit'),df['Amount'], df['Amount'] * -1)

df['Balance'] = df['Balance'].cumsum()

         Date    Status  Amount  Balance
0  06-10-2000   Deposit    40.0     40.0
1  09-12-2002  Withdraw  1000.0   -960.0
2  27-06-2001   Deposit    47.0   -913.0
3  07-12-2021  Withdraw   100.0  -1013.0
4  06-10-2022   Deposit   120.0   -893.0
5  06-10-2000   Deposit    40.0   -853.0
6  09-12-2024  Withdraw    50.0   -903.0
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • I am not able to replicate the solution. Do you mean to say that I don't have to loop over the dataframe at all? If not, how and where does your solutions suggestion fit in my above for loop and if part ? Pls clarify. – user10083444 Jun 04 '20 at 09:24
  • 1
    @user10083444 in pandas we avoid looping as we apply vectorised solutions across columns, I used your source data you can get rid of everything after `fg['Amount'] = fg.Amount.str.replace('$','')` but make sure Amount is an integer `print(df.dtypes)` – Umar.H Jun 04 '20 at 09:28
  • 1
    Understood @Datanovice. Thanks for this elegant solution. I guess i will refrain from using for loops on data frames here after. – user10083444 Jun 04 '20 at 11:15
  • 1
    @user10083444 it can be confusing because when you learn python you learn how to use IF statements and for loops but they don't fit with the pandas API which works with speed and performance. do you understand how `np.where` and `cumsum` works? – Umar.H Jun 04 '20 at 11:24
  • 1
    Yes now i do. Googled about it and read the scipy docs. Also found this excellent explanation in SO itself. https://stackoverflow.com/questions/34667282/numpy-where-detailed-step-by-step-explanation-examples – user10083444 Jun 04 '20 at 12:40