0

I need to change values by another values with any conditions in pandas dataframe, but get error:

>>> df
      X0  X1  X2
0      a   1  0
1      b   3  0
2      c   2  0
3      c   4  0

formula: if (X0 != "a" and X0 != "b") set X2 = X0+X1

result will be :

>>> df
      X0  X1  X2
0      a   1  0
1      b   3  0
2      c   2  c2
3      c   4  c4

I try to use:

df.loc[df.X0!= "a" and df.X0!= "b" ,"X2"]= df.X1+dfX2

but get " ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

Max Sh
  • 109
  • 6

3 Answers3

2

An efficient way is to use np.where, and correctly state your conditions:

import numpy as np
df['X2'] = np.where((df['X0'] != 'a') & (df['X0'] !='b'),df[['X0','X1']].astype(str).apply(''.join,1),0)

which prints:

df
Out[47]: 
  X0  X1  X2
0  a   1   0
1  b   3   0
2  c   2  c2
3  c   4  c4
Rick M
  • 1,012
  • 1
  • 7
  • 9
sophocles
  • 13,593
  • 3
  • 14
  • 33
1

Here is a simple solution. If you have some questions, feel free to ask.

dff = pd.DataFrame({'X0':['a','b','c','c'],'X1':[1,3,2,4],'X2':[0,0,0,0]})
_condition1 = dff.X0 != 'a'
_condition2 = dff.X0 != 'b'
dff['X2'] = np.where(_condition1&_condition2,dff.X0+dff.X1.astype(str),0)
dff

OUTPUT

enter image description here

Lumber Jack
  • 602
  • 3
  • 9
1

Starting with:

df = pd.DataFrame({'X0':['a','b','c','c'],'X1':[1,3,2,4],'X2':[0,0,0,0]})

You got the error because you didn't have parenthesis and didn't use the "&" operator. If you want to do it with .loc, this works:

df.loc[(df.X0 != "a") & (df.X0 != "b"), "X2"]= df.X0 + df.X1.astype(str)

print(df)
  X0  X1  X2   
0  a   1   0    
1  b   3   0   
2  c   2   c2  
3  c   4   c4  
Max Sh
  • 109
  • 6
Rick M
  • 1,012
  • 1
  • 7
  • 9
  • Hi Rick! Please explain my fault: why do you use bitwise "&" instead "and" ? – Max Sh Feb 02 '21 at 20:53
  • I've got a warning: "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy iloc._setitem_with_indexer(indexer, value, self.name)" – Max Sh Feb 02 '21 at 21:08
  • Hi there... it's because you're doing a *vectorized* element-by-element bitwise comparison of two numpy boolean arrays, and numpy uses the `&` operator for that. Each expression in parenthesis is a 4 element numpy array of True/False. You get an error when using `and` because it tries to compare the two arrays as a whole, and numpy doesn't allow that. Also see https://stackoverflow.com/questions/22646463/and-boolean-vs-bitwise-why-difference-in-behavior-with-lists-vs-nump . – Rick M Feb 02 '21 at 21:09
  • The `SettingWithCopyWarning` usually means that you're working on a view of a dataframe that you've already selected a subset of in an earlier step, like `df = df_orig[condition]` . If that's the case, you can do `df = df_orig[condition].copy()` to eliminate the warning. – Rick M Feb 02 '21 at 21:12
  • And what is more efficiently method np.where or loc.[]? – Max Sh Feb 02 '21 at 21:28
  • I'm not sure it's generally true that one is more efficient than the other; you can certainly find a number of SO questions on it if you explore here. My thought is that if you need that level of optimization, you can test the timing on your actual code or a subset of your data. For example, if it's in a Jupyter notebook you can use `%%timeit` at the top of the cell to get a measurement of performance. – Rick M Feb 02 '21 at 21:50