1

I have a function that returns a dict object and I would like to take advantage of pandas/numpy's ability to perform columnwise operations/vectorization for this function across every row of a dataframe. The inputs for the function are specified in the dataframe and I want the outputs of the function to become new columns on the existing dataframe. Below is an example.

def func(a, b, c):
    return {
        "a_calc": a * 2, 
        "b_calc": b * 3, 
        "c_calc": c * 4
    }
df = pd.DataFrame([{"a":1, "b":2, "c": 3}, {"a": 4, "b": 5, "c": 6}])
   a  b  c
0  1  2  3
1  4  5  6

Desired Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

I was reading this answer and it got most of the way there but I couldn't quite figure out how to do it for when the function returns a dict object with the desired column names as the keys within the dict.

Austin Ulfers
  • 354
  • 6
  • 17

2 Answers2

4

Let's use some dataframe unpacking:

df.join(pd.DataFrame(func(**df)))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Or really getting cute:

df.assign(**func(**df))
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

If you cannot modify your function, you can do:

df.join(pd.DataFrame(func(df['a'], df['b'],df['c']), index=df.index))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Note We exploit the fact that func can accept series input and works in parallel. In the general case, you need a for loop:

pd.DataFrame([func(x['a'], x['b'], x['c']) for _, x in df.iterrows()],
              index=df.index)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74