Create multiple pandas DataFrame columns from applying a function with multiple returns

Question

I'd like to apply a function with multiple returns to a pandas DataFrame and put the results in separate new columns in that DataFrame.

So given something like this:

import pandas as pd

df = pd.DataFrame(data = {'a': [1, 2, 3], 'b': [4, 5, 6]})

def add_subtract(a, b):
  return (a + b, a - b)

The goal is a single command that calls add_subtract on a and b to create two new columns in df: sum and difference.

I thought something like this might work:

(df['sum'], df['difference']) = df.apply(
    lambda row: add_subtract(row['a'], row['b']), axis=1)

But it yields this error:

----> 9 lambda row: add_subtract(row['a'], row['b']), axis=1)

ValueError: too many values to unpack (expected 2)

EDIT: In addition to the below answers, pandas apply function that returns multiple values to rows in pandas dataframe shows that the function can be modified to return a list or Series, i.e.:

def add_subtract_list(a, b):
  return [a + b, a - b]

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_list(row['a'], row['b']), axis=1)

or

def add_subtract_series(a, b):
  return pd.Series((a + b, a - b))

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_series(row['a'], row['b']), axis=1)

both work (the latter being equivalent to Wen's accepted answer).

score 8 · Accepted Answer · edited Dec 26 '17 at 21:00

8

Adding pd.Series

df[['sum', 'difference']] = df.apply(
    lambda row: pd.Series(add_subtract(row['a'], row['b'])), axis=1)
df

yields

   a  b  sum  difference
0  1  4    5          -3
1  2  5    7          -3
2  3  6    9          -3

edited Dec 26 '17 at 21:00

Max Ghenis

14,783
16
84
132

answered Dec 26 '17 at 20:45

BENY

317,841
20
164
234

2

Thank you! Can you explain why `pd.Series` is needed here? – Max Ghenis Dec 26 '17 at 21:05
2

@MaxGhenis You have tuple as result in your function , so , we pass tuple to pd.Series , this will reconstruct the column of tuple to two pd.Series (Dataframe), more info https://stackoverflow.com/questions/29550414/how-to-split-column-of-tuples-in-pandas-dataframe – BENY Dec 26 '17 at 21:14
I wonder if `row['a']` and `row['b']` will actually work. Usually this kind of reference should not work inside of `apply()` – Federico Dorato Nov 30 '20 at 13:55

Abdou · Answer 2 · 2017-12-26T21:17:26.960

2

One way to do this would be to use pd.DataFrame.assign as follows:

df.assign(**{k:v for k,v in zip(['sum', 'difference'], add_subtract(df.a, df.b))})

Should yield:

   a  b  difference  sum
0  1  4          -3    5
1  2  5          -3    7
2  3  6          -3    9

Clarifications:

zip is a builtin function that returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For instance, list(zip(['sum', 'difference'], [df.a + df.b], df.a - df.b)) should return [('sum', df.a + df.b), ('difference', df.a - df.b)].

** in front of a dictionary object serves as an operator that unpacks the combination of key and value pairs. In essence, the unpacking could be represented as something like this: sum=df.a + df.b, difference=df.a - df.b.

In sum, when combined, you get something like the following:

df.assign(sum=df.a + df.b, difference=df.a - df.b)

Follow the provided links to both zip and the ** operator in front of a dictionary object to get a better idea of how these useful tools work beyond this particular example.

edited Dec 26 '17 at 21:17

answered Dec 26 '17 at 20:51

Abdou

12,931
4
39
42

This is intriguing: I'm relatively new to Python (mostly an R user), so could you explain what the `**` and `zip` are doing here? Seems like a useful construct. I accepted Wen's answer as it differed least from my guess, but upvoted this and can change if this would be significantly better performance-wise. – Max Ghenis Dec 26 '17 at 21:04
1

@MaxGhenis you can treat the zip in python is list of list in R , in R we need `unlist` here is the example of R :-) (PS, I am 50% R user too :-) ) https://stackoverflow.com/questions/4227223/r-list-to-data-frame – BENY Dec 26 '17 at 21:36
That statement is a bit lacking. Data structures in R are not that easily translated in python data structures. The closest to `zip` I can think of in R is the [`transpose`](http://purrr.tidyverse.org/reference/transpose.html) function from the [`purrr`](http://purrr.tidyverse.org/) package. Even that doesn't really work the same way in all cases. – Abdou Dec 26 '17 at 21:55

Create multiple pandas DataFrame columns from applying a function with multiple returns

2 Answers2

Clarifications:

Linked