2

I have a DataFrame named df with a column col containing values True, False and "N/A" (types are bool, bool and str respectively). I want to select only the rows containing True.

df[df.col==True] works, but generates the warning PEP 8: comparison to True should be 'if cond is True:' or 'if cond:'.

Is there a PEP8 compliant way to do this?

Michael Litvin
  • 3,976
  • 1
  • 34
  • 40
  • What is your pandas version? Because in pandas 0.24.2 is no error – jezrael Apr 14 '19 at 13:16
  • Tested with `df = pd.DataFrame({'col':[True, False, 'N/A']})` – jezrael Apr 14 '19 at 13:19
  • 1
    `is` is not overloaded by Pandas, thus it's not an option in this case... – Michael Litvin Apr 14 '19 at 13:22
  • 1
    In cases like this, if you need an overloaded `__eq__`, yes you can sometimes go against PEP8. See how to disable this specific warning in your lint tool (`# noqa` usually does the trick). – Norrius Apr 14 '19 at 13:24
  • Related: [Expressions with “== True” and “is True” give different results](https://stackoverflow.com/questions/36825925/expressions-with-true-and-is-true-give-different-results) – Georgy Apr 14 '19 at 13:46
  • @Norrius `# noqa` didn't work in PyCharm, any idea how to suppress it there? – Michael Litvin Apr 14 '19 at 14:17

1 Answers1

2

Similar questions were asked before, for example, pandas: Do I have to deviate from style conventions (PEP 8)?, but they all describe a simple case where you have a column of only True and False values. And in that case, you could do just df[df.col].

In your case though you can't do that, because it will give an error, but you have some other options:

  1. Using pd.Series.eq:

    >>> df = pd.DataFrame({'col': [True, False, 'N/A']})
    >>> df[df.col.eq(True)]
        col
    0  True
    
  2. Checking against "N/A" first and then comparing what's left to True. Order matters:

    >>> df[(df.col != 'N/A') & df.col]
        col
    0  True
    
  3. Replacing "N/A" with np.nan and using pd.Series.notnull or pd.Series.notna:

    >>> df = df.replace('N/A', np.nan)
    >>> df[df.col.notnull() & df.col]
        col
    0  True
    
Georgy
  • 12,464
  • 7
  • 65
  • 73