0

I have a dataframe df, of the following form:

df = pd.DataFrame({'C1': {'R1': 'yes','R2': 'no','R3': 'no','R4': 'no','R5': 'no','R6': float("nan")},
 'C2': {'R1': 'on','R2': 'on','R3': 'no','R4': 'yes','R5': 'no','R6': float("nan")},
 'C3': {'R1': 'no', 'R2': 'no', 'R3': 'no', 'R4': 'no', 'R5': 'no', 'R6': float("nan")},
 'C4': {'R1': 'no', 'R2': 'no', 'R3': 'no', 'R4': 'no', 'R5': 'no', 'R6': float("nan")},
 'C5': {'R1': 'yes', 'R2': 'no', 'R3': float("nan"), 'R4': float("nan"), 'R5': 'no', 'R6': float("nan")}}) 

Here's what the dataframe looks like if you print it in the interactive terminal:

>>> print(df)

    C1   C2   C3   C4   C5
R1  yes   on   no   no  yes
R2   no   on   no   no   no
R3   no   no   no   no  NaN
R4   no  yes   no   no  NaN
R5   no   no   no   no   no
R6  NaN  NaN  NaN  NaN  NaN

I would like to delete lines that do not contain at least one "yes" or "on". So the final data frame should be:

   C1  C2  C3  C4  C5
R1 yes on  no  no  yes
R2 no  on  no  no  no
R4 no  yes no  no  NaN

What's the quickest way, without involving for loops or anything like that?

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
LJG
  • 601
  • 1
  • 7
  • 15
  • You can try to use pandas query function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html see also: https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe – droebi Sep 01 '21 at 09:22

2 Answers2

3

Use pandas.DataFrame.isin with any:

new_df = df[df.isin({"yes", "on"}).any(1)]
print(new_df)

Output:

     C1   C2  C3  C4   C5
R1  yes   on  no  no  yes
R2   no   on  no  no   no
R4   no  yes  no  no  NaN
Chris
  • 29,127
  • 3
  • 28
  • 51
-1

You could use:

df[df.isin(['yes', 'no']).any(axis=1)]

or, if you only have 'yes'/'no'/NaN values:

df.dropna(how='all')
mozway
  • 194,879
  • 13
  • 39
  • 75