iterating over a list of columns in pandas dataframe

Question

I have a dataframe like below. I want to update the value of column C,D, E based on column A and B.

If column A < B, then C, D, E = A, else B. I tried the below code but I'm getting ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). error

import pandas as pd
import math
import sys
import re
data=[[0,1,0,0, 0],
      [1,2,0,0,0],
      [2,0,0,0,0],
      [2,4,0,0,0],
      [1,8,0,0,0],
      [3,2, 0,0,0]]

df

Out[59]: 
   A  B  C  D  E
0  0  1  0  0  0
1  1  2  0  0  0
2  2  0  0  0  0
3  2  4  0  0  0
4  1  8  0  0  0
5  3  2  0  0  0
df = pd.DataFrame(data,columns=['A','B','C', 'D','E'])

list_1 = ['C', 'D', 'E']
for i in df[list_1]:
    if df['A'] < df['B']:
        df[i] = df['A']
    else:
        df['i'] = df['B']

I'm expecting below output:

df
Out[59]: 
   A  B  C  D  E
0  0  1  0  0  0
1  1  2  1  1  1
2  2  0  0  0  0
3  2  4  2  2  2
4  1  8  1  1  1
5  3  2  2  2  2

score 1 · Answer 1 · answered Nov 02 '19 at 19:14

1

np.where Return elements are chosen from A or B depending on condition.

df.assign Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

nums = np.where(df.A < df.B, df.A, df.B)
df = df.assign(C=nums, D=nums, E=nums)

answered Nov 02 '19 at 19:14

Aayush Jain

36
3

Good first answer @Ayush – Vishnudev Krishnadas Nov 02 '19 at 19:16
So if there were 20 columns, would you make 20 assignments? – ansev Nov 02 '19 at 19:47

score 1 · Answer 2 · answered Nov 02 '19 at 19:43

1

Use DataFrame.mask:

df.loc[:,df.columns != 'B']=df.loc[:,df.columns != 'B'].mask(df['B']>df['A'],df['A'],axis=0)
print(df)

   A  B  C  D  E
0  0  1  0  0  0
1  1  2  1  1  1
2  2  0  0  0  0
3  2  4  2  2  2
4  1  8  1  1  1
5  3  2  0  0  0

answered Nov 02 '19 at 19:43

ansev

30,322
5
17
31

Vishnudev Krishnadas · Answer 3 · 2019-11-02T19:16:46.190

0

I don't know what you are trying to achieve here. Because condition df['A'] < df['B'] will always return same output in your loop. Just for sake of understanding:

When you do if df['A'] < df['B']:

The if condition expects a Boolean, but df['A'] < df['B'] gives a Series of Boolean values. So, it says either use something like

if (df['A'] < df['B']).all():

OR

if (df['A'] < df['B']).any():

edited Nov 02 '19 at 19:16

answered Nov 02 '19 at 18:57

Vishnudev Krishnadas

10,679
2
23
55

score 0 · Answer 4 · answered Nov 02 '19 at 18:59

0

personally i always use .apply to modify columns based on other columns

list_1 = ['C', 'D', 'E']
for i in list_1:
    df[i]=df.apply(lambda x: x.a if x.a<x.b else x.b, axis=1)

answered Nov 02 '19 at 18:59

giulio

157
8

Is apply efficient on huge datasets? – Shanoo Nov 02 '19 at 19:48
@Shanoo my reference is this post (https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – giulio Nov 02 '19 at 20:20

shev.m · Answer 5 · 2019-11-02T19:49:25.933

0

What I would do is I would only create a DataFrame with columns 'A' and 'B', and then create column 'C' in the following way:

df['C'] = df.min(axis=1)

Columns 'D' and 'E' seem to be redundant.

If you have to start with all the columns and need to have all of them as output then you can do the following:

df['C'] = df[['A', 'B']].min(axis=1)
df['D'] = df['C']
df['E'] = df['C']

edited Nov 02 '19 at 19:49

answered Nov 02 '19 at 19:33

shev.m

43
6

Mykola Zotko · Answer 6 · 2019-11-02T20:03:00.117

0

You can use the function where in numpy:

df.loc[:,'C':'E'] = np.where(df['A'] < df['B'], df['A'], df['B']).reshape(-1, 1)

edited Nov 02 '19 at 20:03

answered Nov 02 '19 at 19:47

Mykola Zotko

15,583
3
71
73

iterating over a list of columns in pandas dataframe

6 Answers6