14

I have a pandas dataframe that I would like to use an apply function on to generate two new columns based on the existing data. I am getting this error: ValueError: Wrong number of items passed 2, placement implies 1

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df['C', 'D'] = df.apply(myfunc1 ,axis=1)

Starting DF:

   A  B
0  6  1
1  8  4

Desired DF:

   A  B  C   D
0  6  1  16  56
1  8  4  18  58
user2242044
  • 8,803
  • 25
  • 97
  • 164

6 Answers6

16

Based on your latest error, you can avoid the error by returning the new columns as a Series

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
oim
  • 1,141
  • 10
  • 14
7

Please be aware of the huge memory consumption and low speed of the accepted answer: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Using the suggestion presented there, the correct answer would be like this:

def run_loopy(df):
    Cs, Ds = [], []
    for _, row in df.iterrows():
        c, d, = myfunc1(row['A'])
        Cs.append(c)
        Ds.append(d)
    return pd.Series({'C': Cs,
                      'D': Ds})

def myfunc1(a):
    c = a + 10
    d = a + 50
    return c, d

df[['C', 'D']] = run_loopy(df)
tobyvd
  • 110
  • 1
  • 9
Federico Dorato
  • 710
  • 9
  • 27
7

It works for me:

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return C, D

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1, axis=1, result_type='expand')
df

add: ==>> result_type='expand',

regards!

Marcelo
  • 133
  • 1
  • 7
3

df['C','D'] is considered as 1 column rather than 2. So for 2 columns you need a sliced dataframe so use df[['C','D']]

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

    A  B   C   D
0  4  6  14  54
1  5  1  15  55

Or you can use chain assignment i.e

df['C'], df['D'] = df.apply(myfunc1 ,axis=1)
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
2

I believe can achieve similar results to @Federico Dorato answer without use of for loop. Return a list rather than a series and use lambda-apply + to_list() to expand results.

It's cleaner code and on a random df of 10,000,000 rows performs as well or faster.

Federico's code

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))
    def run_loopy(df):
        Cs, Ds = [], []
        for _, row in df.iterrows():
            c, d, = myfunc1(row['A'])
            Cs.append(c)
            Ds.append(d)
        return pd.Series({'C': Cs,
                        'D': Ds})

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return c, d

    start = time.time()
    df[['C', 'D']] = run_loopy(df)
    end = time.time()

    run_time.append(end-start) 
print(np.average(run_time)) # 0.001240386962890625

Using lambda and to_list

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return [c, d]

    start = time.time()
    df[['C', 'D']] = df['A'].apply(lambda x: myfunc1(x)).to_list()
    end = time.time()
run_time.append(end-start)
print(np.average(run_time)) #output 0.0009996891021728516
born_naked
  • 748
  • 9
  • 19
1

Add extra brackets when querying for multiple columns.

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
gabe_
  • 354
  • 2
  • 5