20

I have a DataFrame:

   A B

1: 0 1
2: 0 0 
3: 1 1
4: 0 1
5: 1 0

I want to update each item column A of the DataFrame with values of column B if value from column A equals 0.

DataFrame I want to get:

   A B

1: 1 1
2: 0 0 
3: 1 1
4: 1 1
5: 1 0

I've already tried this code

df['A'] = df['B'].apply(lambda x: x if df['A'] == 0 else df['A'])

It raise an error :The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

sailestim
  • 343
  • 1
  • 3
  • 11
  • Not sure this is a duplicate. The linked duplicate is about adding a new column based on another column. This is about updating an existing column (and is easier to find via google). @sailestim My apologies that this was marked as a duplicate. Please keep the questions coming. – informaton Aug 30 '22 at 15:26
  • Answers below use both dot and bracket notation, some references suggest brackets are better: https://www.dataschool.io/pandas-dot-notation-vs-brackets/ https://stackoverflow.com/questions/41030013/pandas-dataframe-where-clause-with-dot-versus-brackets-column-selection – Casey Sep 27 '22 at 18:34

3 Answers3

37
df['A'] = df.apply(lambda x: x['B'] if x['A']==0 else x['A'], axis=1)

Output

    A  B
1:  1  1
2:  0  0
3:  1  1
4:  1  1
5:  1  0
Rushabh Mehta
  • 1,529
  • 1
  • 13
  • 29
15

Use where

In [348]: df.A = np.where(df.A.eq(0), df.B, df.A)

In [349]: df
Out[349]:
    A  B
1:  1  1
2:  0  0
3:  1  1
4:  1  1
5:  1  0
Zero
  • 74,117
  • 18
  • 147
  • 154
10

You can perform this by using a mask:

df = pd.DataFrame()
df['A'] = [0,0,1,0,1]
df['B'] = [1,0,1,1,0]
mask = (df.A == 0)
df.loc[mask,'A'] = df.loc[mask,'B']

    A   B
0   1   1
1   0   0
2   1   1
3   1   1
4   1   0

EDIT: Ok this is actually a unefficient solution:

%timeit df.loc[mask,'A'] = df.loc[mask,'B']
%timeit df.apply(lambda x: x['B'] if x['A']==0 else x['A'], axis=1)
%timeit np.where(df.A.eq(0), df.B, df.A)

5.52 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.27 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
796 µs ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So thanks to zero for this efficient solution with np.where!

ysearka
  • 3,805
  • 5
  • 20
  • 41