0

I have the following problem, and I'm unable to find the right solution.

I have a dataframe.

enter image description here

I need to pass the first_name to another dataframe, only if the following conditions are met:

condition #1 --> If 'id' is 1, then pass 'first_name' ONLY IF 'country' = US AND 'code' = 1 AND 'zip' = 3

condition #2 --> If 'id' is 2, then pass 'first_name' IF 'country' = US. (No need to check for code and zip. Pass first_name irrespective of code and zip)

So, in this dataframe, as per the conditions stated above, it needs to pass only - 'peter', 'mike' and 'jenny'

My code looks like:

filter1 = df['id']=='1'
filter2 = df['country'] ==1
filter3 = df['code']=='1'
filter4 = df['zip'] =='3'

#filtering data 
df.where(filter1 & filter2 & filter3 & filter4, inplace = True)

#then pass first_name

new_df['first_name'] = df['first_name']

But by doing this I'm only able to apply either condition (1) or (2).

Please help. Thank you!

PKV
  • 167
  • 3
  • 13
  • I recommend you look into this post. It will give you a good idea of using `&` and `|` operators. https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas – Joe Ferndz Sep 03 '20 at 06:35

2 Answers2

2

Use boolean indexing with | for chain filters by bitwise OR with filter by column first_name in DataFrame.loc:

#if numbers are not strings remove `''`
filter1 = df['id']==1
filter2 = df['country'] == 'US'
filter3 = df['code']==1
filter4 = df['zip'] ==3
filter5 = df['id']==2

s = df.loc[(filter1 & filter2 & filter3 & filter4) | (filter5 & filter2), 'first_name']
print (s)
1    peter
3     mike
4    jenny
Name: first_name, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I didn't know I could use OR condition between the filters too. Thank you @jezrael! It worked :) – PKV Sep 03 '20 at 06:17
1

Use .loc with a combination of boolean masks.

new_df = df.loc[
    ( # mask1
        df.id.eq(1) & df.country.eq('US') & df.code.eq(1) & df.zip.eq(3)
    )
        | # or
    ( # mask2
        df.id.eq(2) & df.country.eq('US')
    ),
    'first_name'
]

If you use .where instead of .loc with the same boolean masks you will get a dataframe of the same shape as df but every row that is masked as False will be filled with NaN.

RichieV
  • 5,103
  • 2
  • 11
  • 24
  • Interesting solution. It works too. Thank you @RichieV – PKV Sep 03 '20 at 06:26
  • It is actually the same as jezrael, started typing and didn't notice he had posted when I clicked send, don't feel obligated to vote, I'm glad you got an answer – RichieV Sep 03 '20 at 06:28
  • I marked your answer as useful. Looks like someone else voted down. Not sure why :/ – PKV Sep 03 '20 at 06:33
  • @PKV there's really no need to worry, thanks for your concern – RichieV Sep 03 '20 at 07:06