I have a question which is closely related to this post: Pandas conditional creation of a series/dataframe column
The difference to that question is that I would like to use the value of one column to assign the values in MANY other columns. I'd like to avoid writing a for-loop with many if
-statements over all entries for efficiency reasons.
I have a dataset like this:
import pandas as pd
df = pd.DataFrame(columns=['Type', 'Set', 'Q1', 'Q2', 'Q3', 'color', 'number'])
df['Type'] = ['A', 'B', 'B', 'C', 'D', 'E', 'C', 'D']
Which produces:
Type Set Q1 Q2 Q3 color number
0 A NaN NaN NaN NaN NaN NaN
1 B NaN NaN NaN NaN NaN NaN
2 B NaN NaN NaN NaN NaN NaN
3 C NaN NaN NaN NaN NaN NaN
4 D NaN NaN NaN NaN NaN NaN
5 E NaN NaN NaN NaN NaN NaN
6 C NaN NaN NaN NaN NaN NaN
7 D NaN NaN NaN NaN NaN NaN
Based on the information in Type
, I want to create values for various other columns.
For example, for Type==A
, I'd like a list of varying things to happen to the respective rows in the dataframe:
df['Set'] = 'Z'
, df[Q1]=0
, df[Q2]=0
, df[Q3]=random.choice(True, False)
, df[color]='green'
and df[number]=call_on_some_function_I_defined(input = df[Q1])
When Type==B
, I'd like certain other things to happen to those same variables:
df['Set'] = 'X'
, df[Q1]=random.choice(0, 250, 500, 750, 1000)
, etc.
Ideally, I'd like do something along these lines:
import numpy as np
conditions = [
(df['Type'] == 'A'),
(df['Type'] == 'B'),
(df['Type'] == 'C')] #etc.
choices_A = [df['Set'] = 'Z', df[Q1]=0, df[Q2]=0, df[Q3]=random.choice(True, False), df[color]='green', df[number]=call_on_some_function_I_defined(input = df[Q1])]
choices_B = [df['Set'] = 'X', df[Q1]=random.choice(0, 250, 500, 750, 1000)` df[Q2]=random.choice(0, 250, 500, 750, 1000), df[Q3]=False, df[color]='red', df[number]=call_on_some_function_I_defined(input = df[Q2])]
df = np.select(condition[0], choices_A, default=0)
df = np.select(condition[1], choices_B, default=0)
To create output like:
Type Set Q1 Q2 Q3 color number
0 A Z 0 0 True green 17
1 B X 750 0 False red 85
2 B X 500 250 False red 93 #etc
While numpy.select
with its conditions
and choices
is perfect for conditional assignment of values of a single dataframe column, I haven't found a neat way to make conditions work for assigning values to multiple dataframe columns.