0

I have a df named bas. For example it looks like this:

    nat rac numberOnly  
0   DD  AR  548484554
1   AD  AR  168484245
2   FF  COL 484984554
3   WW  DE  484845225
...

It has 50k+ rows. I tried to got only records with rac == AR or COL.

I wrote that code:

AR = bas.where(bas.rac == "AR").dropna()
COL = bas.where(bas.rac == "COL").dropna()

DF = pd.DataFrame()
DF = DF.append(AR)
DF = DF.append(COL)

And Len of df is 27429. But the code dosen't look good. Especially that I want to filter more rac later. So I decided to recode it in this way:

DF = bas.where(bas.rac == ("AR" or "COL")).dropna()

And in this case DF has 27196 rows.

Why? What's the difference here? Which method is better? Maybe I should use something else, instead?

Daemon Painter
  • 3,208
  • 3
  • 29
  • 44
martin
  • 1,145
  • 1
  • 7
  • 24

1 Answers1

1

What your doing will definitely not work. You're looking for isin:

df[df.rac.isin(['AR', 'COL'])]

   nat  rac  numberOnly
0  DD   AR   548484554
1  AD   AR   168484245
2  FF  COL   484984554
yatu
  • 86,083
  • 12
  • 84
  • 139
  • 1
    Well @Artur , I'd suggest you have a look at the dupe, which contains some answers with nice explanations. Also read on what logical operators are used for, then you might see why `bas.rac == ("AR" or "COL")` is not doing at all what you think – yatu Sep 05 '19 at 10:44
  • 1
    Ok @yatu, thank U very much for answer and explanation. I know that or is something diffrent than `|` etc. but I think Ur right and I really need to go deeper in this subject. Thanks again :) – martin Sep 05 '19 at 10:47