7

I'd like to apply a function with multiple returns to a pandas DataFrame and put the results in separate new columns in that DataFrame.

So given something like this:

import pandas as pd

df = pd.DataFrame(data = {'a': [1, 2, 3], 'b': [4, 5, 6]})

def add_subtract(a, b):
  return (a + b, a - b)

The goal is a single command that calls add_subtract on a and b to create two new columns in df: sum and difference.

I thought something like this might work:

(df['sum'], df['difference']) = df.apply(
    lambda row: add_subtract(row['a'], row['b']), axis=1)

But it yields this error:

----> 9 lambda row: add_subtract(row['a'], row['b']), axis=1)

ValueError: too many values to unpack (expected 2)

EDIT: In addition to the below answers, pandas apply function that returns multiple values to rows in pandas dataframe shows that the function can be modified to return a list or Series, i.e.:

def add_subtract_list(a, b):
  return [a + b, a - b]

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_list(row['a'], row['b']), axis=1)

or

def add_subtract_series(a, b):
  return pd.Series((a + b, a - b))

df[['sum', 'difference']] = df.apply(
    lambda row: add_subtract_series(row['a'], row['b']), axis=1)

both work (the latter being equivalent to Wen's accepted answer).

mrnovember
  • 33
  • 5
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132

2 Answers2

8

Adding pd.Series

df[['sum', 'difference']] = df.apply(
    lambda row: pd.Series(add_subtract(row['a'], row['b'])), axis=1)
df

yields

   a  b  sum  difference
0  1  4    5          -3
1  2  5    7          -3
2  3  6    9          -3
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 2
    Thank you! Can you explain why `pd.Series` is needed here? – Max Ghenis Dec 26 '17 at 21:05
  • 2
    @MaxGhenis You have tuple as result in your function , so , we pass tuple to pd.Series , this will reconstruct the column of tuple to two pd.Series (Dataframe), more info https://stackoverflow.com/questions/29550414/how-to-split-column-of-tuples-in-pandas-dataframe – BENY Dec 26 '17 at 21:14
  • I wonder if `row['a']` and `row['b']` will actually work. Usually this kind of reference should not work inside of `apply()` – Federico Dorato Nov 30 '20 at 13:55
2

One way to do this would be to use pd.DataFrame.assign as follows:

df.assign(**{k:v for k,v in zip(['sum', 'difference'], add_subtract(df.a, df.b))})

Should yield:

   a  b  difference  sum
0  1  4          -3    5
1  2  5          -3    7
2  3  6          -3    9

Clarifications:

zip is a builtin function that returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For instance, list(zip(['sum', 'difference'], [df.a + df.b], df.a - df.b)) should return [('sum', df.a + df.b), ('difference', df.a - df.b)].

** in front of a dictionary object serves as an operator that unpacks the combination of key and value pairs. In essence, the unpacking could be represented as something like this: sum=df.a + df.b, difference=df.a - df.b.

In sum, when combined, you get something like the following:

df.assign(sum=df.a + df.b, difference=df.a - df.b)

Follow the provided links to both zip and the ** operator in front of a dictionary object to get a better idea of how these useful tools work beyond this particular example.

Abdou
  • 12,931
  • 4
  • 39
  • 42
  • This is intriguing: I'm relatively new to Python (mostly an R user), so could you explain what the `**` and `zip` are doing here? Seems like a useful construct. I accepted Wen's answer as it differed least from my guess, but upvoted this and can change if this would be significantly better performance-wise. – Max Ghenis Dec 26 '17 at 21:04
  • 1
    @MaxGhenis you can treat the zip in python is list of list in R , in R we need `unlist` here is the example of R :-) (PS, I am 50% R user too :-) ) https://stackoverflow.com/questions/4227223/r-list-to-data-frame – BENY Dec 26 '17 at 21:36
  • That statement is a bit lacking. Data structures in R are not that easily translated in python data structures. The closest to `zip` I can think of in R is the [`transpose`](http://purrr.tidyverse.org/reference/transpose.html) function from the [`purrr`](http://purrr.tidyverse.org/) package. Even that doesn't really work the same way in all cases. – Abdou Dec 26 '17 at 21:55