0

I want to calculate excess amount remaining in ATM from the given dataset of transactions and replenishment.

I can do it by looping over the data to subtract the transactions from current amount. But I need to do this without using loop.

# R: Replenishment amount
# T: Transaction Amount
'''
R    T 
100  50
0    30
0    10
200  110
0    30 
60   20
'''
data = {'Date':pd.date_range('2011-05-03','2011-05-8' ).tolist(),'R':[100,0,0,200,0,60],'T':[50,30,10,110,30,20]}
df = pd.DataFrame(data)

# calculated temporary amount and shift it to subtract future 
# transactions from it
df['temp'] = ((df['R']-df['T']).shift(1).bfill())

# Boolean indicating whether ATM was replenished or not
# 1: Replenished, 0: Not Replenished
df['replenished'] = (df['R'] >0).astype(int)

# If replenished subtract transaction amount from the replenishment amount
# otherwise subtract it from temp amount
df['replenished']*df['R']+(np.logical_not(df['replenished']).astype(int))*df['temp']-df['T']



Expected Results:
0    50.0
1    20.0
2    10.0
3    90.0
4    60.0
5    40.0
dtype: float64

Actual Results:
0    50.0
1    20.0
2    -40.0
3    90.0
4    60.0
5    40.0
dtype: float64
Saifullah
  • 56
  • 6
  • Why do you need to do it without using a loop? – WiseDev Jul 04 '19 at 13:48
  • @BlueRineS because it's usually a big performance hit. See [this](https://stackoverflow.com/questions/24870953/does-pandas-iterrows-have-performance-issues) for details. – Quang Hoang Jul 04 '19 at 14:06

1 Answers1

0

First of all, we compute a boolean column to know if it was replenished, as you do.

df['replenished'] = df['R'] > 0

We also compute the increment in money, which will be useful to perform the rest of the operations.

df['increment'] = df['R'] - df['T']

We also create the column which will have the desired values in due time, I called it reserve. To begin, we do the cumulated sum of the increments, which is the desired value from the first replenishment day until the next one.

df['reserve'] = df['increment'].cumsum()

Now, we are going to create an auxiliary alias of our dataframe, which will be useful to do the operations without losing the original data. Remember that this variable is not a copy, it points to the same data as the original: A change in df_aux will change the original variable df.

df_aux = df

Then we can proceed to the loop that will take care of the problem.

while not df_aux.empty:
    df_aux = df_aux.loc[df_aux.loc[df_aux['replenished']].index[0]:]
    k = df_aux.at[df_aux.index[0], 'reserve']
    l = df_aux.at[df_aux.index[0], 'increment']
    df_aux['reserve'] = df_aux['reserve'] - k + l
    if len(df_aux) > 1:
        df_aux = df_aux.loc[df_aux.index[1]:]
    else:
        break

First, we take all the dataframe starting from the next replenishment day. From this day to the next replenishment day the cumulated sum will give us the desired outcome if the initial value is iqual to the increment, so we modify the cumsum so that the first value complies with this condition.

Then, if this was the last row of the dataframe our work is done and we get out of the loop. If it wasn't, then we drop the replenishment day we just calculated and go on to the next days.

After all these operations, the result (df) is this:


    Date        R       T       increment   replenished     reserve
0   2011-05-03  100     50      50          True            50
1   2011-05-04  0       30      -30         False           20
2   2011-05-05  0       10      -10         False           10
3   2011-05-06  200     110     90          True            90
4   2011-05-07  0       30      -30         False           60
5   2011-05-08  60      20      40          True            40

I'm not experienced with efficiencies in calculus time, so I'm not sure if this solution is faster than looping through all rows.

Alejandro
  • 64
  • 7