1

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks like (this is fake data - not real world):

enter image description here

I have two tuples, each contains a collection of numbers as a string type. For example:

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

I want to retrieve a subset of the rows from the tempDF in which the columns "medcode" OR "enttype" have at least one entry in the tuples above. Thus, from the example above, I would retrieve a subset containing rows with the index 8 and 9 and 11.

Until updating some packages earlier today (too many now to work out which has started throwing the warning), this did work:

tempDF = tempDF[tempDF["medcode"].isin(codeSet) | tempDF["enttype"].isin(additionalClinicalCodes)]

But now it is throwing the warning:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Looking at the API, isin states the the condition "if ALL" is in the iterable collection. I want an "if ANY" condition.

UPDATE #1

The problem lies with using the | operator, also the np.logical_or method. If I remove the second isin condition i.e., just keep tempDF[tempDF["medcode"].isin(codeSet) then no warning is thrown but I'm only subsetting on the one possible condition.

Anthony Nash
  • 834
  • 1
  • 9
  • 26
  • Looks good to me, are you sure the warning is cased by this line? – Ynjxsjmh May 04 '22 at 17:39
  • I'm afraid so. I've loaded the code in Pycharm and I can cause the expression to execute in debug mode and each time the warning is displayed in the Console window. – Anthony Nash May 05 '22 at 09:01
  • 1
    Check out this [question]( https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur) with the same error message. A couple of workarounds are proposed there. – DF.Richard May 08 '22 at 23:51
  • 1
    What are your pandas and numpy versions? – JuliettVictor May 11 '22 at 05:46

5 Answers5

2
import numpy as np
tempDF = tempDF[np.logical_or(tempDF["medcode"].isin(codeSet), tempDF["enttype"].isin(additionalClinicalCodes))
safay
  • 547
  • 5
  • 14
  • 2
    Thanks @safay. I didn't know of np.logical_or - unfortunately, this still throws the warning: C:\Users\yewro\anaconda3\envs\HCDD\lib\site-packages\numpy\lib\arraysetops.py:583: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison mask |= (ar1 == a) The "isin" states ALL content must be present, and as each column can only take one value then this throws the warning. I've updated my question to make the data a little clearer. – Anthony Nash May 05 '22 at 09:00
  • Why don't you just combine `codeset` and `additionalClinicalCodes` `newTempTuple= (codeset+ additionalClinicalCodes)` and use `isin(newTempTuple)` – A. Ahmed May 14 '22 at 06:57
0

I'm unable to reproduce your warning (I assume you are using an outdated numpy version), however I believe it is related to the fact that your enttype column is a numerical type, but you're using strings in additionalClinicalCodes.

Xnot
  • 171
  • 1
  • 3
0

Try this:

tempDF = temp[temp["medcode"].isin(list(codeset)) | temp["enttype"].isin(list(additionalClinicalCodes))]
Alex
  • 139
  • 6
0

Boiling your question down to an executable example:

import pandas as pd

tempDF = pd.DataFrame({'medcode': ['6108', '6154', '95744', '98120'], 'enttype': ['99', '131', '372', '372']})

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

newDF = tempDF[tempDF["medcode"].isin(codeset) | tempDF["enttype"].isin(additionalClinicalCodes)]
print(newDF)
print("Pandas Version")
print(pd.__version__)

This returns for me

  medcode enttype
0    6108      99
1    6154     131
3   98120     372
Pandas Version
1.4.2

Thus I am not able to reproduce your warning.

jugi
  • 622
  • 7
  • 15
0

This is a numpy strange behaviour. I think the right way to do this is yours way, but if the warning bothers you, try this:

tempDF = tempDF[
    (
        tempDF.medcode.isin(codeset).astype(int) +
        tempDF.isin(additionalClinicalCode).astype(int)
    ) >= 1
]