9

I am new to python and using pandas.

I want to query a dataframe and filter the rows where one of the columns is not NaN.

I have tried:

a=dictionarydf.label.isnull()

but a is populated with true or false. Tried this

dictionarydf.query(dictionarydf.label.isnull())

but gave an error as I expected

sample data:

      reference_word         all_matching_words  label review
0           account             fees - account    NaN      N
1           account           mobile - account    NaN      N
2           account          monthly - account    NaN      N
3    administration  delivery - administration    NaN      N
4    administration      fund - administration    NaN      N
5           advisor             fees - advisor    NaN      N
6           advisor          optimum - advisor    NaN      N
7           advisor              sub - advisor    NaN      N
8             aichi           delivery - aichi    NaN      N
9             aichi               pref - aichi    NaN      N
10          airport              biz - airport    travel      N
11          airport              cfo - airport    travel      N
12          airport           cfomtg - airport    travel      N
13          airport          meeting - airport    travel      N
14          airport           summit - airport    travel      N
15          airport             taxi - airport    travel      N
16          airport            train - airport    travel      N
17          airport         transfer - airport    travel      N
18          airport             trip - airport    travel      N
19              ais                admin - ais    NaN      N
20              ais               alpine - ais    NaN      N
21              ais                 fund - ais    NaN      N
22       allegiance       custody - allegiance    NaN      N
23       allegiance          fees - allegiance    NaN      N
24            alpha               late - alpha    NaN      N
25            alpha               meal - alpha    NaN      N
26            alpha               taxi - alpha    NaN      N
27           alpine             admin - alpine    NaN      N
28           alpine               ais - alpine    NaN      N
29           alpine              fund - alpine    NaN      N

I want to filter the data where label is not NaN

expected output:

     reference_word         all_matching_words   label    review
0          airport              biz - airport    travel      N
1          airport              cfo - airport    travel      N
2          airport           cfomtg - airport    travel      N
3          airport          meeting - airport    travel      N
4          airport           summit - airport    travel      N
5          airport             taxi - airport    travel      N
6          airport            train - airport    travel      N
7          airport         transfer - airport    travel      N
8          airport             trip - airport    travel      N
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
DileepGogula
  • 331
  • 5
  • 20
  • 4
    Does this answer your question? [Python pandas Filtering out nan from a data selection of a column of strings](https://stackoverflow.com/questions/22551403/python-pandas-filtering-out-nan-from-a-data-selection-of-a-column-of-strings) – Rohit Nandi Apr 14 '21 at 14:51

1 Answers1

9

You can use dropna:

df = df.dropna(subset=['label'])

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N

Another solution - boolean indexing with notnull:

df = df[df.label.notnull()]

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks for the quick answer :) @jezrael that solved the problem. I opted for the boolean indexing because i don't want to drop the rows and i need not create a duplicate dataframe as well. Both the solutions worked perfect – DileepGogula Sep 26 '16 at 06:09