0

Suppose that I have a data frame with 3 columns A, B, C. I would like select the rows for which column B satisfies some condition or column C satisfies some condition. Is there an efficient way of doing this?

To be concrete, suppose I have:

import pandas as pd
df = pd.DataFrame({'A':['mary','john','ashley'],\
               'B':['xiao','derric','john'],\
               'C':['faye','linnett','bruce']})

I would like to select the rows where column B is John or column C is john. Is there a more elegant to do this than:

df[(df['B']=='John') | (df['C']=='John')]

In my real application, df will have many rows and this row selection is done many times. So efficiency is desirable.

yurnero
  • 315
  • 2
  • 9
  • The basic options are chaining with `|` or `query`. See [this post](https://stackoverflow.com/questions/12096252/use-a-list-of-values-to-select-rows-from-a-pandas-dataframe) for timings. –  May 04 '22 at 18:50

1 Answers1

1
cols = ['A','B'...]

(df[cols] == 'John').any(axis='columns')
Ian Wright
  • 166
  • 4
  • While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. –  May 05 '22 at 00:57