0

I have a dataframe like below. I want to update the value of column C,D, E based on column A and B.

If column A < B, then C, D, E = A, else B. I tried the below code but I'm getting ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). error

import pandas as pd
import math
import sys
import re
data=[[0,1,0,0, 0],
      [1,2,0,0,0],
      [2,0,0,0,0],
      [2,4,0,0,0],
      [1,8,0,0,0],
      [3,2, 0,0,0]]

df

Out[59]: 
   A  B  C  D  E
0  0  1  0  0  0
1  1  2  0  0  0
2  2  0  0  0  0
3  2  4  0  0  0
4  1  8  0  0  0
5  3  2  0  0  0
df = pd.DataFrame(data,columns=['A','B','C', 'D','E'])

list_1 = ['C', 'D', 'E']
for i in df[list_1]:
    if df['A'] < df['B']:
        df[i] = df['A']
    else:
        df['i'] = df['B']

I'm expecting below output:

df
Out[59]: 
   A  B  C  D  E
0  0  1  0  0  0
1  1  2  1  1  1
2  2  0  0  0  0
3  2  4  2  2  2
4  1  8  1  1  1
5  3  2  2  2  2
Shanoo
  • 1,185
  • 1
  • 11
  • 38

6 Answers6

1

np.where Return elements are chosen from A or B depending on condition.

df.assign Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

nums = np.where(df.A < df.B, df.A, df.B)
df = df.assign(C=nums, D=nums, E=nums)
1

Use DataFrame.mask:

df.loc[:,df.columns != 'B']=df.loc[:,df.columns != 'B'].mask(df['B']>df['A'],df['A'],axis=0)
print(df)

   A  B  C  D  E
0  0  1  0  0  0
1  1  2  1  1  1
2  2  0  0  0  0
3  2  4  2  2  2
4  1  8  1  1  1
5  3  2  0  0  0
ansev
  • 30,322
  • 5
  • 17
  • 31
0

I don't know what you are trying to achieve here. Because condition df['A'] < df['B'] will always return same output in your loop. Just for sake of understanding:

When you do if df['A'] < df['B']:

The if condition expects a Boolean, but df['A'] < df['B'] gives a Series of Boolean values. So, it says either use something like

if (df['A'] < df['B']).all():

OR

if (df['A'] < df['B']).any():
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
0

personally i always use .apply to modify columns based on other columns

list_1 = ['C', 'D', 'E']
for i in list_1:
    df[i]=df.apply(lambda x: x.a if x.a<x.b else x.b, axis=1)
giulio
  • 157
  • 8
0

What I would do is I would only create a DataFrame with columns 'A' and 'B', and then create column 'C' in the following way:

df['C'] = df.min(axis=1)

Columns 'D' and 'E' seem to be redundant.

If you have to start with all the columns and need to have all of them as output then you can do the following:

df['C'] = df[['A', 'B']].min(axis=1)
df['D'] = df['C']
df['E'] = df['C']
shev.m
  • 43
  • 6
0

You can use the function where in numpy:

df.loc[:,'C':'E'] = np.where(df['A'] < df['B'], df['A'], df['B']).reshape(-1, 1)
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73