Pandas Apply Function That returns two new columns

Question

I have a pandas dataframe that I would like to use an apply function on to generate two new columns based on the existing data. I am getting this error: ValueError: Wrong number of items passed 2, placement implies 1

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df['C', 'D'] = df.apply(myfunc1 ,axis=1)

Starting DF:

   A  B
0  6  1
1  8  4

Desired DF:

   A  B  C   D
0  6  1  16  56
1  8  4  18  58

@coldspeed, the dataframe passed could be many columns, but only two needed for the calculation — user2242044, Dec 25 '17 at 15:16
Possible duplicate of [Apply pandas function to column to create multiple new columns?](https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns) — Federico Dorato, Oct 16 '19 at 09:21

score 16 · Accepted Answer · answered Dec 25 '17 at 16:21

16

Based on your latest error, you can avoid the error by returning the new columns as a Series

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

answered Dec 25 '17 at 16:21

oim

1,141
10
14

Please be aware of the huge memory consumption and low speed of the accepted answer, alternative solution below – Federico Dorato Mar 26 '20 at 15:17

score 7 · Answer 2 · edited Mar 24 '21 at 10:47

7

Please be aware of the huge memory consumption and low speed of the accepted answer: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Using the suggestion presented there, the correct answer would be like this:

def run_loopy(df):
    Cs, Ds = [], []
    for _, row in df.iterrows():
        c, d, = myfunc1(row['A'])
        Cs.append(c)
        Ds.append(d)
    return pd.Series({'C': Cs,
                      'D': Ds})

def myfunc1(a):
    c = a + 10
    d = a + 50
    return c, d

df[['C', 'D']] = run_loopy(df)

edited Mar 24 '21 at 10:47

tobyvd

110
1
9

answered Oct 16 '19 at 09:19

Federico Dorato

710
9
27

I think you should edit `Cs, Ds = [], []` (1st row of `run_loopy`) to `v1s, v2s = [], []` or vice versa – codkelden Nov 10 '20 at 19:55
@codkelden Thanks for noticing! I will change v1s and v2s to Cs and Ds, so whoever reads it understands quickly we are talking about the columns – Federico Dorato Nov 11 '20 at 07:26
This is indeed much faster – akotronis Oct 26 '21 at 14:53

score 7 · Answer 3 · answered Mar 11 '21 at 23:43

7

It works for me:

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return C, D

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1, axis=1, result_type='expand')
df

add: ==>> result_type='expand',

regards!

answered Mar 11 '21 at 23:43

Marcelo

133
1
7

1

Just had this problem and adding `, result_type='expand'` was the only way I could get this to work, thank you – a11 May 06 '22 at 18:35
I think this is the easiest way, thanks! – ruloweb Aug 22 '23 at 03:43

Bharath M Shetty · Answer 4 · 2017-12-25T15:13:12.133

3

df['C','D'] is considered as 1 column rather than 2. So for 2 columns you need a sliced dataframe so use df[['C','D']]

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

    A  B   C   D
0  4  6  14  54
1  5  1  15  55

Or you can use chain assignment i.e

df['C'], df['D'] = df.apply(myfunc1 ,axis=1)

edited Dec 25 '17 at 15:13

answered Dec 25 '17 at 15:07

Bharath M Shetty

30,075
6
57
108

1

This worked on my example dataset (so upvoted), but does not work on my real dataset despite identical code. Error: `KeyError: "['C' 'D'] not in index"` – user2242044 Dec 25 '17 at 15:10
1

I need to see how you are assigning the data. Your actual code perhaps. – Bharath M Shetty Dec 25 '17 at 15:10
1

Same way, the only code that is difference is reading in a dataframe from CSV vs using numpy to generate fake data `df[['C', 'D']] = df.apply(myfunc1 ,axis=1)` – user2242044 Dec 25 '17 at 15:11
1

Your myfunc1 is same as the above? – Bharath M Shetty Dec 25 '17 at 15:11
1

@user2242044. Your error message shows that there is a missing comma between ‘C’ and ‘D’. – Goose Dec 25 '17 at 15:37
@Goose you know if you dont pass a comma it will be considered as a single string like `'CD'`. Sometimes assignment wont work. Hard to remember the cases. – Bharath M Shetty Dec 25 '17 at 15:38

score 2 · Answer 5 · answered Jul 04 '21 at 19:06

I believe can achieve similar results to @Federico Dorato answer without use of for loop. Return a list rather than a series and use lambda-apply + to_list() to expand results.

It's cleaner code and on a random df of 10,000,000 rows performs as well or faster.

Federico's code

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))
    def run_loopy(df):
        Cs, Ds = [], []
        for _, row in df.iterrows():
            c, d, = myfunc1(row['A'])
            Cs.append(c)
            Ds.append(d)
        return pd.Series({'C': Cs,
                        'D': Ds})

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return c, d

    start = time.time()
    df[['C', 'D']] = run_loopy(df)
    end = time.time()

    run_time.append(end-start) 
print(np.average(run_time)) # 0.001240386962890625

Using lambda and to_list

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return [c, d]

    start = time.time()
    df[['C', 'D']] = df['A'].apply(lambda x: myfunc1(x)).to_list()
    end = time.time()
run_time.append(end-start)
print(np.average(run_time)) #output 0.0009996891021728516

score 1 · Answer 6 · answered Dec 25 '17 at 15:06

Add extra brackets when querying for multiple columns.

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

Pandas Apply Function That returns two new columns

6 Answers6

Federico's code

Using lambda and to_list

Linked

Related