How to use if else in pandas numpy when apply function in all the rows fast

Question

I have a dataframe df_ia:

    dod1    dod2
0   0       0
1   200806  0
2   200806  0
3   200806  0
4   200806  0
5   200806  0
6   200806  0
7   200806  0

and a function used to apply to every row:

def life_status(dod1, dod2):
    if dod1.any() == 0:
        ls1 = '1'
    else:
        ls1 = '0'
    if dod2.any() == 0:
        ls2 = '1'
    else:
        ls2 = '0'
    lifestatus = ls1 + ls2
    return lifestatus

df_ia['lifestatus'] = life_status(df_ia['dod1'].values,df_ia['dod2'].values)

But I found that,I can't direct use :

if dod1.any() to add condition

so I tried something like,

if np.any(dod1==0):
   ls1='1'

But it still not work.

The output should looks like:

    dod1  dod2 lifestatus
0   0       0   11
1   200806  0   01
2   200806  0   01
3   200806  0   01
4   200806  0   01
5   200806  0   01
6   200806  0   01
7   200806  0   01
8   200806  0   01
9   200806  0   01

I can use this code to achieve this,

def life_status(row):
    if row['dod1'] == 0:
        ls1 = '1'
    else:
        ls1 = '0'
    if row['dod2'] == 0:
        ls2 = '1'
    else:
        ls2 = '0'
    lifestatus = ls1 + ls2
    return lifestatus
df['lifestatus'] = df.apply(lambda row: life_status(row), axis=1)

but it is very slow that is why I post this question.

@William I think what you want there is what Psidom mentioned. Is it? How about you write the pseudo code of the logic of `life_status()`? — CypherX, Jul 14 '21 at 01:16
Hi @Psidom can you help me with this very similar question:https://stackoverflow.com/questions/68371165/numpy-ndarray-object-has-no-attribute-str-while-using-if-else-in-numpy-panda — William, Jul 14 '21 at 02:06

CypherX · Accepted Answer · 2021-07-14T20:02:38.833

2

Solution

Based on what you explained in the comment's section, your previously shared function had a wrong logic and that misguided my previous solution. You need to evaluate int(dod1[i] == 0) + int(dod2[i] == 0) for each row and return a series or numpy.ndarray.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'dod1': [0] + [200806 for _ in range(7)], 
    'dod2': [0 for _ in range(8)],
})

def life_status(dod1: np.ndarray, dod2: np.ndarray):
    return (dod1 == 0).astype(int).astype(str) + (dod2 == 0).astype(int).astype(str)

life_status(df['dod1'].values, df['dod2'].values)

## Output:
# I will update this later. But the function should work as expected.

Or, equivalently, directly use this on the dataframe.

(df.dod1 == 0).astype(int).astype(str) + (df.dod2 == 0).astype(int).astype(str)

A Note to the reader

In case you want to make it more generic, such as when (dod1 == 0) is True, assign 4 and when it is False, assigne 5, you can do it as follows.

# schema:
# - condition: dod1 == 0 --> True: 4, False: 5
# - condition: dod1 == 0 --> True: 7, False: 8
cond1, cond2 = (df.dod1 == 0), (df.dod2 == 0)
((cond1 * 4 + ~cond1 * 5).astype(str) + (cond2 * 7 + ~cond2 * 8).astype(str)).tolist()

## Output
# ['47', '57', '57', '57', '57', '57', '57', '57']

You can further improvise it and allow any value (str, int, float) to replace based on when it is True or False.

(df.dod1 == 0).astype(str).replace({'True': '4', 'False': '5'}) + \
(df.dod2 == 0).astype(str).replace({'True': '7', 'False': '8'})

## Output
# ['47', '57', '57', '57', '57', '57', '57', '57']

edited Jul 14 '21 at 20:02

answered Jul 14 '21 at 01:27

CypherX

7,019
3
25
37

@William Please take a look at this and let me know if you have any questions. Also, please provide your logic of the function `life_status()`, as I am afraid you may have incorrectly implemented it. – CypherX Jul 14 '21 at 01:41
1

Hi @CypherX can you help me with this very similar question:https://stackoverflow.com/questions/68371165/numpy-ndarray-object-has-no-attribute-str-while-using-if-else-in-numpy-panda – William Jul 14 '21 at 02:06
@William I just posted an answer to your other question. – CypherX Jul 14 '21 at 02:30
This answer is not working I just checked, because if only return the first condition it seems. – William Jul 14 '21 at 02:59
Please write down your logic for the function. – CypherX Jul 14 '21 at 03:03
@William The data that I was using was different from yours. Now they are the same. So, the function should give you `2` as the result. – CypherX Jul 14 '21 at 03:08
Thank you for your reply, first I need ls1 = '1' ,so as my logic is , if dod1== 0: ls1 = '1' else: ls1 = '0' if dod2 == 0: ls2 = '1' else: ls2 = '0' lifestatus = ls1 + ls2 return lifestatus ,so for example the second row should get a 01,but use your code I get 11. – William Jul 14 '21 at 03:13
Okay. This is why I was asking for your logic in plain english, as you are evaluating per row. However, the function that you shared checks the two columns (not as per row). – CypherX Jul 14 '21 at 03:52
Thank you for letting me know, is there any fast way to check per row?Now I'm using:def life_status(row): if row['dod1'] == 0: ls1 = '1' else: ls1 = '0' if row['dod2'] == 0: ls2 = '1' else: ls2 = '0' lifestatus = ls1 + ls2 return lifestatus df_ia['lifestatus'] = df_ia.apply(lambda row: life_status(row), axis=1) and it is very slow – William Jul 14 '21 at 03:55
Could you please first explain this requirement in your question? Please provide "input" and "expected output" for each row. – CypherX Jul 14 '21 at 03:58
Yes,sir!I just updated my question ,please help check! – William Jul 14 '21 at 04:03
@William Please check now. – CypherX Jul 14 '21 at 05:10
Nice!!It works!Thank you so much! But what if :ls1 not equal 1 or 0,I mean like this: if row['dod1'] == 0: ls1 = '2' else: ls1 = '3' if row['dod2'] == 0: ls2 = '4' else: ls2 = '5' Because I think you directly convert the true or false to 1 and o right? – William Jul 14 '21 at 05:34
@William Yes, you got it right. I added a small snippet to the answer, to give you some idea on how to generalize the function for the scenarion you mentioned. I hope it helps. – CypherX Jul 14 '21 at 19:33
Appreciate it a lot! – William Jul 14 '21 at 19:52
Hi friend can you help me with this question?https://stackoverflow.com/questions/68476193/how-to-merge-2-pandas-daataframes-base-on-multiple-conditions-faster – William Jul 21 '21 at 20:38

How to use if else in pandas numpy when apply function in all the rows fast

1 Answers1

Solution

A Note to the reader

Linked