Finding rows with "Not Applicable" value from a specific column from Graphlab SFrame

Question

Given a Graphlab.SFrame object with the following column names:

>>> import graphlab
>>> sf = graphlab.SFrame.read_csv('some.csv')
>>> s.column_names()
['Dataset', 'Domain', 'Score', 'Sent1', 'Sent2']

One could easily drop the rows with "not applicable" (NA) / None value in a particular column, e.g. to drop rows with NA values for the "Score" column, I could do this:

>>> sf.dropna('Score')

Or to replace the None value with a certain value (let's say -1), I could do this:

>>> sf.fillna('Score', -1)

After checking the SFrame docs from https://dato.com/products/create/docs/generated/graphlab.SFrame.html, there isn't a built-in function to find the rows that contains None for a certain column, something like sf.findna('Score'). Or possibly I might have missed it.

If there is such a function, what is it called?

If there isn't how should I extract the rows where there's a specified column in that row with NA values?

score 2 · Accepted Answer · answered Dec 17 '15 at 18:25

I think you can use a boolean array to identify the rows with missing values for a given column.

>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
...                       'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]

Finding rows with "Not Applicable" value from a specific column from Graphlab SFrame

1 Answers1