1

I'm trying to select rows from a pandas DataFrame based on multiple conditions. The code looks like this:

row = videos_train_df[
             (videos_train_df['pid1']==pid1)
            &(videos_train_df['pid2']==pid2)
            &(videos_train_df['vid'] ==vid)]

Is there any better way (in terms of code readability) to do the same thing?

  • 3
    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html ? – MaxNoe Dec 23 '21 at 21:54
  • Do you find that difficult to read? I don't, especially the way you've spaced it. I think the meaning is quite clear. – Tim Roberts Dec 23 '21 at 21:54
  • 1
    @Tim Roberts this particular example isn't too difficult to read, but `df.query` is a nice way to avoid reusing the `videos_train_df` variable name in the condition. Plus there could be 100 conditions instead of 3 conditions and then passing a string to df.query is easier to work with – Derek O Dec 23 '21 at 21:57

2 Answers2

3

You can use query

row = videos_train_df.query(
    f"pid1 == {pid1} and pid2 == {pid2} and vid == {vid}"
)

See also this question.

Andrea Di Iura
  • 467
  • 5
  • 11
0

I will do all after eq

new = df[df[['pid1','pid2','vid']].eq([pid1,pid2,vid]).all(1)]
BENY
  • 317,841
  • 20
  • 164
  • 234