I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following:
def apply_func_to_df(df):
df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1)
def transform_func(value_A, value_B, value_C):
# do some processing and transformation and stuff
return new_value_A, new_value_B
I am trying to apply this function as shown above to the whole DataFrame df
in order to output 2 NEW columns. However, this can generalize to a usecase/function that takes in n
DataFrame columns and outputs m
new columns to the same DataFrame.
The following are things I have been looking at (with varying degrees of success):
- Create a Pandas Series for the function call, then append to the existing DataFrame,
- Zip the output columns (but there are some issues that happen in my current implementation)
- Re-write the basic function
transform_func
to explicitly expect rows (i.e. fields)A
,B
,C
as follows, then do an apply to the df:
def transform_func_mod(df_row):
# do something with df_row['A'], df_row['B'], df_row['C]
return new_value_A, new_value_B
I would like a very general and Pythonic way to accomplish this task, while taking performance into account (both memory- and time-wise). I would appreciate any input on this, as I have been struggling with this due to my unfamiliarity with Pandas.