2

I have a pandas DataFrame in the following format:

df.head()

        y   y_pred
599     0   0
787     9   9
47      2   2
1237    1   1
1069    6   6

I want to find the rows / index numbers - where y != y_pred.

I am trying to do it through Select but am not able to do so. Please help.

TIA

chhibbz
  • 462
  • 8
  • 30

2 Answers2

5

Use query:

df = df.query('y != y_pred').index

Sample:

print (df)
      y  y_pred
599   0       1 <-values changed for match
787   9       9
47    2       2
1237  1       1
1069  6       3 <-values changed for match

df = df.query('y != y_pred').index
print (df)
Int64Index([599, 1069], dtype='int64')

Solutions with boolean indexing are:

df1 = df[df.y != df.y_pred].index
print (df1)
Int64Index([599, 1069], dtype='int64')

Or another answer.

For check different values:

print (df.query('y != y_pred'))
      y  y_pred
599   0       1
1069  6       3

print (df[df.y != df.y_pred])
      y  y_pred
599   0       1
1069  6       3
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Try:

df.index[df.y != df.y_pred]

Let's alter your sample data

df.iloc[0, 0] = 1
df.iloc[3, 1] = 0
print(df)

      y  y_pred
599   1       0
787   9       9
47    2       2
1237  1       0
1069  6       6

Then try our code

df.index[df.y != df.y_pred]

Int64Index([599, 1237], dtype='int64')

For more efficiency, use the underlying numpy arrays

df.index.values[df.y.values != df.y_pred.values]

array([ 599, 1237])

you can return the df subset with

df.loc[(df.y != df.y_pred).values]

      y  y_pred
599   1       0
1237  1       0
piRSquared
  • 285,575
  • 57
  • 475
  • 624