I have a large (5000+ rows) CSV file of transactions that we know contains some errors.
It has the following fields:
date description money_in money_out balance
01-01-2017 stringvalue 349 0 1000
02-01-2017 stringvalue 0 100 900
03-01-2017 stringvalue 10 0 890
To check which rows contains faulty data I've added the following code:
df['difference'] = df['money In'] - df['money Out']
df['BalanceDif'] = df['balance'] - df['balance'].shift()
df['RowCorrect'] = df['BalanceDif'].equals(df['difference'])
This gives the the following (somewhat puzzling) output (first columns left out):
Balance difference BalanceDif RowCorrect
682.36 30 30 False
758.36 76 76 False
708.36 -50 -50 False
707.57 -0.79 -0.79 False
712.57 5 5 False
762.57 50 50 False
Does anyone know what I am doing wrong, and why the 'df.RowCorrect' is returning the wrong value?