0

It seems like I keep coming across this same problem. I have a dataframe in which I have two particular columns. I want to create a new column which uses either a value from one of the two columns or the other based on a boolean expression. A simple example:

import pandas as pd
df = pd.DataFrame({'A': ['Fear', 'Surprise', 'Fear', 'Fear'], 
                   'B': ['Efficiency', 'Efficiency', 'Efficiency', 'Devotion'], 
                   'C': [True, True, False, False]})
df['D'] = df['A'] if df['C'] else df['B']

That last line is pseudocode for what I want to do. If I run this code, it complains about the truth of a series being ambiguous in that last line. Desired output in column 'D' is ['Fear', 'Surprise', 'Efficiency', 'Devotion']

I have a feeling there is a simple solution to this problem, but I have yet to come across it.

mermaldad
  • 322
  • 2
  • 7
  • 1
    Use `df['D'] = np.where(df['C'],df['A'], df['B'])` – jezrael Oct 29 '19 at 13:09
  • 1
    Thanks, that works. I figured it was something simple like that but I couldn't configure a search which would get me there. – mermaldad Oct 29 '19 at 13:21
  • Incidentally, shortly after posting, I came up with my own solution, which is more algebraic: `df['D'] = df['A'] * df['C'] + df['B'] * ~df['C']` – mermaldad Oct 29 '19 at 13:23

0 Answers0