2

I am trying to see if any values change in my dataframe, from row to row. One column in particular has a lot of None values.
If I check for inequality between None values, element-wise, it gives the answer I expect (that is, None != None returns False), but not when I do that same comparison on the column.

Here's a minimal working example:

>>> example_data = [None]*3 + ['a','b','b'] + [None]*4
>>> df = pd.DataFrame(example_data, columns = ['col1'])
>>> print df

    col1
0   None
1   None
2   None
3   a
4   b
5   b
6   None
7   None
8   None
9   None

To see if the value changed from one line to the next, I'm shifting everything and comparing the values:

>>> did_it_change = (df != df.shift(1))
>>> print did_it_change

    col1
0   True
1   True
2   True
3   True
4   True
5   False
6   True
7   True
8   True
9   True

So long as None is not involved in the comparison, it's accurate (e.g. row 5 changed the value to b from its previous value of b, so it did not change).

If I check a single element, it works as expected:

>>> df.loc[1,'col1'] != df.loc[2,'col1']
False

But that's not the answer that appears in the full column: did_it_change.loc[1,:] is True.

I do get the expected outcome if I use "" instead of None, but replacing None with empty string like in this question seems arbitrary and superfluous.

This is the outcome I'd expect.

>>> example_data = [""]*3 + ['a','b','b'] + [""]*4
>>> df = pd.DataFrame(example_data, columns = ['col1']) 
>>> did_it_change = (df != df.shift(1))
>>> print did_it_change

    col1
0   True
1   False
2   False
3   True
4   True
5   False
6   True
7   False
8   False
9   False

What's going on here? Why does the comparison with None give different answers if I'm calling it on a dataframe column instead of its elements?

Community
  • 1
  • 1
Aturen
  • 93
  • 1
  • 7
  • 1
    I suspect that `None` is getting converted to `np.nan` underneath the hood when doing DataFrame level operations, which would explain the output as `np.nan != np.nan` returns `True` by definition. See the output of `df == df`. – root Mar 21 '17 at 20:44
  • Good theory! I just ran `df == df` and sure enough, the `None` values are not equal to themselves. At least now I know why it's happening. I still think this is undesired behavior... – Aturen Mar 21 '17 at 20:55

0 Answers0