6

Given the DataFrame

import pandas as pd

df = pd.DataFrame({
    'transformed': ['left', 'right', 'left', 'right'],
    'left_f': [1, 2, 3, 4],
    'right_f': [10, 20, 30, 40],
    'left_t': [-1, -2, -3, -4],
    'right_t': [-10, -20, -30, -40],
})

I want to create two new columns, picking from either left_* or right_* depending on the content of transformed:

df['transformed_f'] = df['right_f'].where(
    df['transformed'] == 'right',
    df['left_f']
)

df['transformed_t'] = df['right_t'].where(
    df['transformed'] == 'right',
    df['left_t']
)

And I get the expected result

df
#    transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
# 0  left              1       10      -1      -10              1             -1
# 1  right             2       20      -2      -20             20            -20
# 2  left              3       30      -3      -30              3             -3
# 3  right             4       40      -4      -40             40            -40

However when I try to do it in one operation I get an unexpected result containing NaN values

df[['transformed_f', 'transformed_t']] = df[['right_f', 'right_t']].where(
    df['transformed'] == 'right',
    df[['left_f', 'left_t']]
)

df
#    transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
# 0  left              1       10      -1      -10            NaN            NaN
# 1  right             2       20      -2      -20           20.0          -20.0
# 2  left              3       30      -3      -30            NaN            NaN
# 3  right             4       40      -4      -40           40.0          -40.0

Is there a way to use df.where() on multiple columns at once?

Nils Werner
  • 34,832
  • 7
  • 76
  • 98

1 Answers1

4

You are close , just add.values or .to_numpy() with the slice to make it an NDarray:

Per docs:

other : scalar, NDFrame, or callable Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the NDFrame and should return scalar or NDFrame. The callable must not change input NDFrame (though pandas doesn’t check it).

So when you directly input the slice of the dataframe, the indexes(col names) dont match and hence it doesn't update the df, when you pass .values , it ignores the indexes and add the values.

df[['transformed_f', 'transformed_t']]=(df[['right_f', 'right_t']].
                        where(df['transformed'] == 'right',df[['left_f', 'left_t']].values))
print(df)

  transformed  left_f  right_f  left_t  right_t  transformed_f  transformed_t
0        left       1       10      -1      -10              1             -1
1       right       2       20      -2      -20             20            -20
2        left       3       30      -3      -30              3             -3
3       right       4       40      -4      -40             40            -40
anky
  • 74,114
  • 11
  • 41
  • 70