0

I have a function returning a tuple of values, as an example:

def dumb_func(number):
    return number+1,number-1

I'd like to apply it to a pandas DataFrame

df=pd.DataFrame({'numbers':[1,2,3,4,5,6,7]})
test=dumb_df['numbers'].apply(dumb_func)

The result is that test is a pandas series containing tuples. Is there a way to use the variable test or to remplace it to assign the results of the function to two distinct columns 'number_plus_one' and 'number_minus_one' of the original DataFrame?

Stefano
  • 127
  • 2
  • 12
  • 1
    [Previous solutions](https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns/47097625#47097625) apparently slower to do multiple columns (Ted Petrou answer). – DarrylG May 24 '20 at 17:52

2 Answers2

1
df[['number_plus_one', 'number_minus_one']] = pd.DataFrame(zip(*df['numbers'].apply(dumb_func))).transpose()

To understand, try taking it apart piece by piece. Have a look at zip(*df['numbers'].apply(dumb_func)) in isolation (you'll need to convert it to a list). You'll see how it unpacks the tuples one by one and creates two separate lists out of them. Then have a look what happens when you create a dataframe out of it - you'll see why the transpose is necessary. For more on zip, see here : docs.python.org/3.8/library/functions.html#zip

SimonR
  • 1,774
  • 1
  • 4
  • 10
  • 1
    Thanks, it works! Could you explain me this code? What * is doing? Why do I need to transpose the generated dataframe? – Stefano May 24 '20 at 17:52
0

Method 1: When you don't use dumb function,

df[['numbers_plus_one','numbers_minus_one']]=pd.DataFrame(df.apply(lambda x: (x[0]+1,x[0]-1),axis=1).values.tolist())

Method 2: When you have test(i.e. series of tuples you mentioned in question)

df[['numbers_plus_one','numbers_minus_one']]=pd.DataFrame(test.values.tolist())

I hope this is helpful

  • Thanks, the Method 1 is not very useful to me, dumb_function were only an example, but I really need to use a function, as the operations are not so trivial. The Method 2 is very interesting as it is simple and clear, it is faster or slower respect to the one suggested by SimonR? – Stefano May 24 '20 at 18:38
  • I think the method suggested by SimonR is faster. Because there we are not making any conversion to lists. If you have do that way try this, ```df[['numbers_plus_one','numbers_minus_one']]=pd.DataFrame(zip(*df.apply(lambda x: (x[0]+1,x[0]-1),axis=1))).transpose()``` – SUDHEER TALLURI May 24 '20 at 18:55
  • Ok, thanks, indeed also you method n2 was very instructive, so thank you :) – Stefano May 25 '20 at 16:33