Creating multiple pandas columns from function returning dict

Question

I have a function that returns a dict object and I would like to take advantage of pandas/numpy's ability to perform columnwise operations/vectorization for this function across every row of a dataframe. The inputs for the function are specified in the dataframe and I want the outputs of the function to become new columns on the existing dataframe. Below is an example.

def func(a, b, c):
    return {
        "a_calc": a * 2, 
        "b_calc": b * 3, 
        "c_calc": c * 4
    }

df = pd.DataFrame([{"a":1, "b":2, "c": 3}, {"a": 4, "b": 5, "c": 6}])
   a  b  c
0  1  2  3
1  4  5  6

Desired Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

I was reading this answer and it got most of the way there but I couldn't quite figure out how to do it for when the function returns a dict object with the desired column names as the keys within the dict.

score 4 · Accepted Answer · answered Apr 07 '21 at 02:06

4

Let's use some dataframe unpacking:

df.join(pd.DataFrame(func(**df)))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Or really getting cute:

df.assign(**func(**df))

answered Apr 07 '21 at 02:06

Scott Boston

147,308
15
139
187

Quang Hoang · Answer 2 · 2021-04-07T02:20:18.857

2

If you cannot modify your function, you can do:

df.join(pd.DataFrame(func(df['a'], df['b'],df['c']), index=df.index))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Note We exploit the fact that func can accept series input and works in parallel. In the general case, you need a for loop:

pd.DataFrame([func(x['a'], x['b'], x['c']) for _, x in df.iterrows()],
              index=df.index)

edited Apr 07 '21 at 02:20

answered Apr 07 '21 at 01:56

Quang Hoang

146,074
10
56
74

In my case, `func()` cannot accept series data. Is there any other way besides using `.iterrows()`? – Austin Ulfers Apr 07 '21 at 02:11
@AustinUlfers Can you rewrite your function to accept a pd.Series? – Scott Boston Apr 07 '21 at 02:12
@ScottBoston Its a relatively long function so I would like to avoid having to rewrite it if possible. – Austin Ulfers Apr 07 '21 at 02:18
@AustinUlfers I think the only other way is to use lambda with axis=1 which is woefully slow and inefficient. – Scott Boston Apr 07 '21 at 02:19
@ScottBoston ok, good to know. I'll go down the rewriting path then because I need this to be quick. – Austin Ulfers Apr 07 '21 at 02:20
1

@AustinUlfers `iterrows`, `apply`, vanilla `for` loop are essentially equivalent. You could avoid one or another, but in general you would need to use one. – Quang Hoang Apr 07 '21 at 02:21

Creating multiple pandas columns from function returning dict

2 Answers2