3

Imagine you have the following df:

d = {'line amount#1': [0.21, 0.13, 0.1], 'line amount#2': [0.0, 0.05, .05], 'ExclBTW': [0.5, 0.18, .05]}
dftaxitems = pd.DataFrame(data=d)
dftaxitems


line amount#1   line amount#2   ExclBTW
0   0.21                0.00    0.50
1   0.13                0.05    0.18
2   0.10                0.05    0.05

Now, I want to change all values of line amounts to np.nan when they do not add up to the BTW column and keep the value when they do add up.

So I want to do it dynamically because the line amounts may be up to 10 lines large.

However getting the following error with the following code:

#change line amount if not totalling into ExclBTW:
dfchecklineamount = dftaxitems.filter(like='line amount').astype(float)
dfchecklineamount['sum'] = dfchecklineamount[list(dfchecklineamount.columns)].sum(axis=1)
dfchecklineamount['check'] = np.where(dfchecklineamount['sum'].astype(float) == dfresult['ExclBTW'].astype(float),True, False)

dfchecklineamount['check'] = np.where(dfchecklineamount['sum'].astype(float) == dfresult['ExclBTW'].astype(float),True, False)
colstochange = dfchecklineamount.filter(regex ='line amount').columns
dfchecklineamount[colstochange] = np.where(dfchecklineamount['check'] == False, np.nan,dfchecklineamount[colstochange] )







 ValueError: operands could not be broadcast together with shapes (2,) () (2,4) 

Please help!

Desired output:

line amount#1   line amount#2   ExclBTW
0   np.nan             np.nan    0.50
1   0.13                0.05     0.18
2   np.nan             np.nan     0.05
Max
  • 493
  • 2
  • 9

4 Answers4

4

We can use DataFrame.filter with sum over axis=1 then we set the values to NaN with DataFrame.mask:

lines = dftaxitems.filter(like="line")
m = lines.sum(axis=1).ne(dftaxitems["ExclBTW"])
dftaxitems[lines.columns] = lines.mask(m)
   line amount#1  line amount#2  ExclBTW
0            NaN            NaN     0.50
1           0.13           0.05     0.18
2            NaN            NaN     0.05
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • this seemed like the excellent solution however coming up with different results.. Line amounts are NaN'd when they shouldn't be – Max Apr 22 '21 at 08:00
  • 1
    Can you adjust your example data to reproduce the problem. The given one right now gives correct results – Erfan Apr 22 '21 at 08:03
  • My values were non-float so had to use .astype(float) – Max Apr 22 '21 at 12:53
4

EDIT: Adding more Generic solution which handles any number of similar lineamount columns

c = dftaxitems.filter(like='line amount#').columns
m = dftaxitems[c].sum(1).eq(dftaxitems['ExclBTW'])
dftaxitems.loc[~m, c] = np.nan
dftaxitems


Could you please try following. Simple explanation would be: we can use boolean indexing to fill the NaN values. First get mask in m variable(where we are checking condition if sum of 2 columns are equals to 3rd column), then using loc function to set NaN values accordingly.

m = (dftaxitems['line amount#1'] + dftaxitems['line amount#2']) == dftaxitems['ExclBTW']

dftaxitems.loc[~m, ['line amount#1', 'line amount#2']] = np.nan
dftaxitems

Output will be as follows:

   line amount#1  line amount#2  ExclBTW
0            NaN            NaN     0.50
1           0.13           0.05     0.18
2            NaN            NaN     0.05
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0
dftaxitems['line amount#1'] = np.where(dftaxitems['line amount#1'] + dftaxitems['line amount#1'] >= dftaxitems['ExclBTW'], dftaxitems['line amount#1'], np.nan)
dftaxitems['line amount#2'] = np.where(dftaxitems['line amount#1'] + dftaxitems['line amount#1'] >= dftaxitems['ExclBTW'], dftaxitems['line amount#2'], np.nan)

I use this, but there must be a better way to solve it

Amri Rasyidi
  • 172
  • 1
  • 10
0
import pandas as pd
import numpy as np

d = {'line amount#1': [0.21, 0.13, 0.1], 'line amount#2': [0.0, 0.05, .05], 'ExclBTW': [0.5, 0.18, .05]}
df = pd.DataFrame(data=d)
print(df)

df['line amount#1'] = df['line amount#1'].where(df['line amount#1'] + df['line amount#2'] == df['ExclBTW'], other=np.nan)
df['line amount#2'] = df['line amount#1'].where(df['line amount#1'] + df['line amount#2'] == df['ExclBTW'], other=np.nan)
print(df)

Output:

   line amount#1  line amount#2  ExclBTW
0           0.21           0.00     0.50
1           0.13           0.05     0.18
2           0.10           0.05     0.05

   line amount#1  line amount#2  ExclBTW
0            NaN            NaN     0.50
1           0.13           0.13     0.18
2            NaN            NaN     0.05
Shi XiuFeng
  • 715
  • 5
  • 13
  • So I want to do it dynamically because the line amounts may be up to 10 lines large. – Max Apr 22 '21 at 07:39