3

Given a Graphlab.SFrame object with the following column names:

>>> import graphlab
>>> sf = graphlab.SFrame.read_csv('some.csv')
>>> s.column_names()
['Dataset', 'Domain', 'Score', 'Sent1', 'Sent2']

One could easily drop the rows with "not applicable" (NA) / None value in a particular column, e.g. to drop rows with NA values for the "Score" column, I could do this:

>>> sf.dropna('Score')

Or to replace the None value with a certain value (let's say -1), I could do this:

>>> sf.fillna('Score', -1)

After checking the SFrame docs from https://dato.com/products/create/docs/generated/graphlab.SFrame.html, there isn't a built-in function to find the rows that contains None for a certain column, something like sf.findna('Score'). Or possibly I might have missed it.

If there is such a function, what is it called?

If there isn't how should I extract the rows where there's a specified column in that row with NA values?

alvas
  • 115,346
  • 109
  • 446
  • 738

1 Answers1

2

I think you can use a boolean array to identify the rows with missing values for a given column.

>>> import graphlab
>>> sf = graphlab.SFrame({'a': [1, 2, None, 4],
...                       'b': [None, 3, 1, None]})
>>> mask = sf['a'] == None
>>> mask
dtype: int
Rows: 4
[0, 0, 1, 0]
papayawarrior
  • 1,027
  • 7
  • 10