0

I have a Data frame of Research results that I want to filter based on all four values being the same,

Trial 1 , Trial 2 , Trial 3 , Trail 4,
Pass      Pass      Pass      Pass 
Pass      Fail      Pass      Pass 
Pass      Pass      Fail      Fail 

I have tried using the syntax Df.trail1 == df.trail2 which works if Im trying to filter consistent results for Trial 1 and 2 , but If I want to filter consistent results for all for and I use (df.trial1 == df.trial2) & (df.trial3 == df.trail4) it does filter but it filters Pass Pass and Fail Fail as consistent. I want to filter only consistent results across all four tests using the same sort of syntax (simple code)

Thank you in advnace

techytushar
  • 673
  • 5
  • 17
  • I'm not sure if this is the most efficent way, but what about checking `(df.trial1 == df.trial2) & (df.trial2 == df.trial3) & (df.trial3 == df.trail4)` – user1558604 Dec 07 '19 at 14:39

1 Answers1

1

Use DataFrame.nunique for test if each row has unique number of values, but solution is slow if large DataFrame:

cols = ['Trial 1', 'Trial 2', 'Trial 3', 'Trail 4']

mask = df[cols].nunique(axis=1) == 1
print (mask)
0     True
1    False
2    False
dtype: bool

Or test, if each column has same values by first column with DataFrame.eq and DataFrame.all:

mask = df[cols].eq(df[cols[0]], axis=0).all(axis=1)

Detail:

print (df[cols].eq(df[cols[0]], axis=0))
   Trial 1  Trial 2  Trial 3  Trail 4
0     True     True     True     True
1     True    False     True     True
2     True     True    False    False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thanks the only problem is with this is the actual DF has lots of other content I have only inlcuded the columns Im interested in – Nicola Hodge Dec 07 '19 at 14:41
  • @NicolaHodge - Answer was edited, use list of columns names for testing, here is called `cols` – jezrael Dec 07 '19 at 14:46