Create a new column from two existing columns applying a generic function (x,y) so that I can use the function with different columns

Question

I am trying to calculate a discount that I would like to apply to each row of two columns of my dataframe and add the result to a new column.

I have already tried many ways, by following existing examples, but everytime an error occurs.

I define the function as:

def delta_perc(x,y):
    if y == 0:
        return 0
    else:
        return (x-y)/x*100

and then try to apply the function to my dataframe

ordini["discount"] = ordini.apply(delta_perc(ordini["revenue1"],ordini["revenue2"]), axis=1)

I expected a new column where each row was the result of the function applied to ordini["revenue1"] and ordini["revenue2"].

But I get the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I also tried to apply all the suggestion from here but everytime an error occured.

Are you sure it's when `y==0` and not `x==0` that you return 0, given that you divide by `x`, which leads to `np.inf`? — ALollz, Apr 15 '19 at 15:45
Sure! While I was focusing on the second part of the code I was making that bigger mistake. — rafspo, Apr 16 '19 at 15:50

piRSquared · Accepted Answer · 2019-04-15T16:04:28.867

You are getting a few concepts mixed up. When you use pandas.DataFrame.apply (with axis=1) you are iterating through each row and passing that row (as a pandas.Series object) to the function you used when you called apply.

First Point of Failure

Instead, you are calling your function inside the apply and passing two columns to the function. This will pass the return value of the function to the apply. Since your function does not pass back a callable object, this should fail.

Second Point of Failure

Also, your function is designed to look at scalar values hence if y == 0: and when you pass column like ordini["revenue1"] (which is a pandas.Series object) it tries to evaluate if pandas.Series == 0: and that is what is generating the error you see:

ValueError: The truth value of a Series is ambiguous.

Approach #1

Fix your function and don't use apply

def delta_perc(x, y):
    return x.sub(y).div(x).mask(x == 0, 0).mul(100)

ordini["discount"] = delta_perc(ordini["revenue1"], ordini["revenue2"])

Approach #2

Fix your funciton and use map. This would be similar to using a comprehension.

def delta_perc(x, y):
    if x == 0:
        return 0
    else:
        return (x - y) / x * 100

ordini["discount"] = [*map(delta_perc, ordini["revenue1"], ordini["revenue2"])]

Approach #3

Actually using apply

def delta_perc(x, y):
    if x == 0:
        return 0
    else:
        return (x - y) / x * 100

# Because remember `apply` takes a function that gets a row (or column) passed to it
ordini["discount"] = ordini.apply(
    lambda row: delta_perc(row['revenue1'], row['revenue2']),
    axis=1
)

Thanks! You have given me really precious insights! – rafspo Apr 16 '19 at 15:46 — rafspo, Apr 16 '19 at 15:46

score 2 · Answer 2 · answered Apr 15 '19 at 15:22

2

You can also try:

ordini["discount"] = [delta_perc(a,b) for a,b in zip(ordini["revenue1"],ordini["revenue2"])]

answered Apr 15 '19 at 15:22

Quang Hoang

146,074
10
56
74

score 2 · Answer 3 · answered Apr 15 '19 at 15:44

You should apply this calculation to entire Series with np.where:

import pandas as pd
import numpy as np

def delta_perc(x, y):
    return np.where(y != 0, (x-y)/x*100, 0)
    # I think you may want when x != 0, since you divide by x: 
    #return np.where(x != 0, (x-y)/x*100, 0)

Example:

np.random.seed(12)
df = pd.DataFrame(np.random.randint(0,10,(10,2)))

df['new_col'] = delta_perc(df[0], df[1])
#   0  1     new_col
#0  6  1   83.333333
#1  2  3  -50.000000
#2  3  0    0.000000
#3  6  1   83.333333
#4  4  5  -25.000000
#5  9  2   77.777778
#6  6  0    0.000000
#7  5  8  -60.000000
#8  2  9 -350.000000
#9  3  4  -33.333333

pythonjokeun · Answer 4 · 2019-04-15T15:54:15.743

1

Have you tried adding lambda inside apply like this ?

ordini["discount"] = ordini.apply(
    lambda x: delta_perc(x["revenue1"], x["revenue2"]), axis=1
)

Try this, if performance matters to you.

import numpy as np

delta_perc_vec = np.vectorize(delta_perc)
ordini["discount"] = delta_perc_vec(df["revenue1"].values, df["revenue2"].values)

edited Apr 15 '19 at 15:54

answered Apr 15 '19 at 15:14

pythonjokeun

431
2
8