0

I am new to def function , I am trying to get the logic in def function with multiple if condition. I want x,y,z to be flexible parameter so I can change parameter value in x,y,z. but i can't get the desired output. anyone help ?

df =

    date      comp  mark    value   score   test1
0   2022-01-01  a      1       10     100   
1   2022-01-02  b      2       20     200   
2   2022-01-03  c      3       30     300   
3   2022-01-04  d      4       40     400   
4   2022-01-05  e      5       50     500   

Desired ouput =

        date    comp    mark    value   score   test1
0   2022-01-01  a          1    10       100    200
1   2022-01-02  b          2    20       200    400
2   2022-01-03  c          3    30       300    600
3   2022-01-04  d          4    40       400    4000
4   2022-01-05  e          5    50       500    5000

I can get the result use:

    def frml(df):
        if (df['mark'] > 3) and (df['value'] > 30):
            return df['score'] * 10
        else:
            return df['score'] * 2

df['test1'] = df.apply(frml,axis=1)

but i can't get the result use this: isn't the logic is the same?

 x = df['mark']
 y = df['value']
 z = df['score']

def frml(df):
    if (x > 3) and (y > 30):
        return z * 10
    else:
        return z * 2

df['test1'] = df.apply(frml,axis=1)
Michael Butscher
  • 10,028
  • 4
  • 24
  • 25
stvlam22
  • 63
  • 7
  • No, the logic is not the same. Within the function, `df` represents a single row of the dataframe, because that's what the `apply` operation gives you. Outside the function, `df['mark']` refers to an entire COLUMN. Why can't you use the first format, which is correct? – Tim Roberts Dec 03 '22 at 04:45
  • Does [this](https://stackoverflow.com/a/73669816/19123103) help? Another resource [here](https://stackoverflow.com/a/73643899/19123103). TL;DR: apply is not good pandas. Use mask or numpy.where instead. – cottontail Dec 03 '22 at 05:52
  • sorry for late reply, doing some year end thesis.. yes cotton tail thanks for the reference link. it is really helpful.. thanks for your advice – stvlam22 Dec 12 '22 at 12:11

1 Answers1

1

you can use mask instead apply

cond1 = (df['mark'] > 3) & (df['value'] > 30)
df['score'].mul(2).mask(cond1, df['score'].mul(10))

output:

0     200
1     400
2     600
3    4000
4    5000
Name: score, dtype: int64

make output to test1 column

df.assign(test1=df['score'].mul(2).mask(cond1, df['score'].mul(10)))

result:

    date        comp    mark    value   score   test1
0   2022-01-01  a       1       10      100     200
1   2022-01-02  b       2       20      200     400
2   2022-01-03  c       3       30      300     600
3   2022-01-04  d       4       40      400     4000
4   2022-01-05  e       5       50      500     5000



It's possible to explain why your 2nd function doesn't work, but it's complicated.

Also, making your output don't need apply def func.

So tell you another way.


use mask or np.where or np.select instead apply def func

Panda Kim
  • 6,246
  • 2
  • 12