How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

Question

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks like (this is fake data - not real world):

I have two tuples, each contains a collection of numbers as a string type. For example:

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

I want to retrieve a subset of the rows from the tempDF in which the columns "medcode" OR "enttype" have at least one entry in the tuples above. Thus, from the example above, I would retrieve a subset containing rows with the index 8 and 9 and 11.

Until updating some packages earlier today (too many now to work out which has started throwing the warning), this did work:

tempDF = tempDF[tempDF["medcode"].isin(codeSet) | tempDF["enttype"].isin(additionalClinicalCodes)]

But now it is throwing the warning:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Looking at the API, isin states the the condition "if ALL" is in the iterable collection. I want an "if ANY" condition.

UPDATE #1

The problem lies with using the | operator, also the np.logical_or method. If I remove the second isin condition i.e., just keep tempDF[tempDF["medcode"].isin(codeSet) then no warning is thrown but I'm only subsetting on the one possible condition.

Looks good to me, are you sure the warning is cased by this line? — Ynjxsjmh, May 04 '22 at 17:39
I'm afraid so. I've loaded the code in Pycharm and I can cause the expression to execute in debug mode and each time the warning is displayed in the Console window. — Anthony Nash, May 05 '22 at 09:01
Check out this [question]( https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur) with the same error message. A couple of workarounds are proposed there. — DF.Richard, May 08 '22 at 23:51

score 2 · Answer 1 · answered May 04 '22 at 17:41

2

import numpy as np
tempDF = tempDF[np.logical_or(tempDF["medcode"].isin(codeSet), tempDF["enttype"].isin(additionalClinicalCodes))

answered May 04 '22 at 17:41

safay

547
5
14

2

Thanks @safay. I didn't know of np.logical_or - unfortunately, this still throws the warning: C:\Users\yewro\anaconda3\envs\HCDD\lib\site-packages\numpy\lib\arraysetops.py:583: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison mask |= (ar1 == a) The "isin" states ALL content must be present, and as each column can only take one value then this throws the warning. I've updated my question to make the data a little clearer. – Anthony Nash May 05 '22 at 09:00
Why don't you just combine `codeset` and `additionalClinicalCodes` `newTempTuple= (codeset+ additionalClinicalCodes)` and use `isin(newTempTuple)` – A. Ahmed May 14 '22 at 06:57

score 0 · Answer 2 · answered May 11 '22 at 20:56

0

I'm unable to reproduce your warning (I assume you are using an outdated numpy version), however I believe it is related to the fact that your enttype column is a numerical type, but you're using strings in additionalClinicalCodes.

answered May 11 '22 at 20:56

Xnot

171
1
3

score 0 · Answer 3 · answered May 13 '22 at 08:31

0

Try this:

tempDF = temp[temp["medcode"].isin(list(codeset)) | temp["enttype"].isin(list(additionalClinicalCodes))]

answered May 13 '22 at 08:31

Alex

139
6

score 0 · Answer 4 · answered May 13 '22 at 12:29

Boiling your question down to an executable example:

import pandas as pd

tempDF = pd.DataFrame({'medcode': ['6108', '6154', '95744', '98120'], 'enttype': ['99', '131', '372', '372']})

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

newDF = tempDF[tempDF["medcode"].isin(codeset) | tempDF["enttype"].isin(additionalClinicalCodes)]
print(newDF)
print("Pandas Version")
print(pd.__version__)

This returns for me

  medcode enttype
0    6108      99
1    6154     131
3   98120     372
Pandas Version
1.4.2

Thus I am not able to reproduce your warning.

score 0 · Answer 5 · answered May 13 '22 at 21:40

This is a numpy strange behaviour. I think the right way to do this is yours way, but if the warning bothers you, try this:

tempDF = tempDF[
    (
        tempDF.medcode.isin(codeset).astype(int) +
        tempDF.isin(additionalClinicalCode).astype(int)
    ) >= 1
]

How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

5 Answers5