You could also use the where()
method on the DataFrame object right away. You can provide the condition to this method as the first argument. See the following example:
dataset.where(dataset['class']==0)
Which would give the following output
f000001 f000002 f000003 ... f000102 f000103 class
0 0.000000 0.000000 0.000000 ... 0.000000 0.080000 0.0
1 0.000000 0.000000 0.000000 ... 0.000000 0.058824 0.0
2 0.000000 0.000000 0.000000 ... 0.000000 0.095238 0.0
3 0.029867 0.000000 0.012769 ... 0.000000 0.085106 0.0
4 0.000000 0.000000 0.000000 ... 0.000000 0.085106 0.0
5 0.000000 0.000000 0.000000 ... 0.000000 0.085106 0.0
6 0.000000 0.000000 0.000000 ... 0.000000 0.127660 0.0
7 0.000000 0.000000 0.000000 ... 0.000000 0.106383 0.0
8 0.000000 0.000000 0.000000 ... 0.000000 0.127660 0.0
9 0.000000 0.000000 0.000000 ... 0.000000 0.106383 0.0
10 0.000000 0.000000 0.000000 ... 0.000000 0.085106 0.0
11 0.021392 0.000000 0.000000 ... 0.000000 0.042553 0.0
12 -0.063880 -0.124403 -0.102466 ... 0.000000 0.042553 0.0
13 0.000000 0.000000 0.000000 ... 0.000000 0.021277 0.0
14 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.0
15 0.000000 0.000000 -0.060884 ... 0.000000 0.000000 0.0
[18323 rows x 104 columns]
(I got rid of the rest of the output for brevity of the answer)
A huge advantage of using this method over just referencing is that you can additionally replace those values that don't match the condition using the other
argument, and also perform some operation on the values that match the condition using the inplace
argument. Basically, you can reconstruct the rows of the your dataframe as desired.
Additionally, because this function returns the a dataframe minus those rows that don't match the condition, you could re-reference a specific column such as
dataset.where(dataset['class']==0)['f000001']
And this will print the 'f000001'
(first feature) column for you, where the class label is 0.