15

Given a series

s = pd.Series([1.1, 1.2, np.nan])
s
0    1.1
1    1.2
2    NaN
dtype: float64

If the need arises to convert the NaNs to None (to, for example, work with parquets), then I would like to have

0     1.1
1     1.2
2    None
dtype: object

I would assume Series.replace would be the obvious way of doing this, but here's what the function returns:

s.replace(np.nan, None)

0    1.1
1    1.2
2    1.2
dtype: float64

The NaN was forward filled, instead of being replaced. Going through the docs, I see that if the second argument is None, then the first argument should be a dictionary. Based on this, I would expect replace to either replace as intended, or throw an exception.

I believe the workaround here is

pd.Series([x if pd.notna(x) else None for x in s], dtype=object) 
0     1.1
1     1.2
2    None
dtype: object

Which is fine. But I would like to understand why this behaviour occurs, whether it is documented, or if it is just a bug and I have to dust off my git profile and log one on the issue tracker... any ideas?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 2
    `s.where(s.notnull(),None)` is another cleaner workaround I guess – Vivek Kalyanarangan Jan 03 '19 at 11:38
  • @VivekKalyanarangan Thank you! Will file that away for future reference... – cs95 Jan 03 '19 at 11:39
  • @VivekKalyanarangan Hmm, I don't believe so, this question is specifically with respect to the behaviour of replace. What do you think? – cs95 Jan 03 '19 at 11:41
  • I believe this is in the docs: `The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None`. when referring to method parameter, so when value is None the method used is pad (the default) – Dani Mesejo Jan 03 '19 at 11:41
  • 4
    to me this looks like a bug, I would expect it to throw an exception or do nothing, forward filling is incorrect, I would file this as an issue: https://github.com/pandas-dev/pandas/issues – EdChum Jan 03 '19 at 11:42
  • "The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None." Can you specify in which part it says if second argument is None the first should be a dictionary? – ayhan Jan 03 '19 at 11:42
  • @ayhan Sure, in the docs section under the `to_replace` argument, it says "Dicts can be used to specify different replacement values for different existing values. [...] To use a dict in this way the value parameter should be None." – cs95 Jan 03 '19 at 11:43
  • 2
    @coldspeed Yes now I get it it is different. The worst part now is I am going through some of my own implementations just to check whether a bug has creeped in because of this. Thanks for the question! `s.replace(np.nan, None)` is in fact counterintuitive when it forward fills – Vivek Kalyanarangan Jan 03 '19 at 11:43
  • 4
    this works `s.replace({np.nan:None})` but I'd expect the less verbose method to behave the same – EdChum Jan 03 '19 at 11:44
  • That's the other way around though. If you pass a dict, then value should be None. That doesn't mean if the value is None `to_replace` should be a dict though? – ayhan Jan 03 '19 at 11:44
  • @EdChum Thanks for weighing in, and for the suggested workaround-that is even cleaner! I guess I'll get to filing that bug soon... – cs95 Jan 03 '19 at 11:45
  • @ayhan Now that you mention it... – cs95 Jan 03 '19 at 11:47
  • 1
    Here's Nicki's [workaround](https://stackoverflow.com/questions/40663225/unexpected-pandas-series-replace-behavior#comment68557738_40663225). We might close that as a duplicate if you get an authoritative response to this one. – ayhan Jan 03 '19 at 11:56
  • @ayhan Two years and still the same thing :D This is a dupe, so I will close it, but I think it is worth investigating... – cs95 Jan 03 '19 at 12:01
  • @ayhan If you think it is better left open for a dev to write an answer, then feel free to reopen... I am fine with anything. – cs95 Jan 03 '19 at 12:02
  • 1
    I believe it was not documented when I wrote that answer. I remember figuring it out by trial and error. Let's give this some time if anybody wants to investigate further. – ayhan Jan 03 '19 at 12:25
  • See: https://github.com/pandas-dev/pandas/issues/19998 – root Jan 05 '19 at 00:11

1 Answers1

7

This behaviour is in the documentation of the method parameter:

method : {‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

So in your example to_replace is a scalar, and value is None. The method by default is pad, from the documentation of fillna:

pad / ffill: propagate last valid observation forward to next valid
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • This still doesn't explain why the NaNs are forward filled, though? – cs95 Jan 03 '19 at 11:46
  • 3
    Well that would suggest `s.replace(np.nan, None, method=None)` would work but it doesn't and borks – EdChum Jan 03 '19 at 11:47
  • @coldspeed the method by default is pad – Dani Mesejo Jan 03 '19 at 11:47
  • So, you're saying `s.replace(np.nan, None)` is treated to be `s.replace(np.nan, None, method='pad')` and hence forward filled? Hmm, that does make sense, but I don't know if it is the intended behaviour or the right behaviour to display in this case. – cs95 Jan 03 '19 at 11:49
  • @coldspeed Yes, from the documentation that is what it seems. – Dani Mesejo Jan 03 '19 at 11:50
  • It is possible this is by design, and that the only way to replace NaNs to None is using a dictionary as EdChum observed. I can accept this explanation, although I would like some closure from the devs. Will open an issue and see what they say. Thanks for shedding some light on this! – cs95 Jan 03 '19 at 11:50
  • @EdChum Setting the second argument to `None` overwrites the method though. – ayhan Jan 03 '19 at 11:50
  • 1
    To me this is unexpected, it's a special edge case which is not what I would expect given that if there was no match say `s.replace('foo',None)` then it would return the original series unchanged – EdChum Jan 03 '19 at 11:50
  • @ayhan sorry I don't understand, you mean you expect this to work: `s.replace(np.nan, value=None, method=None)` or that passing `None` has unintended consequences? – EdChum Jan 03 '19 at 11:52
  • @EdChum I believe `value=None` triggers `method='pad'` so `method=None` is ignored (based on the order of arguments). I agree it is unintuitive but I think it works as documented. – ayhan Jan 03 '19 at 11:54
  • 2
    @ayhan ah OK, to me this is weird, it will probably not be changed given it's documented but it's unexpected to me, I wouldn't expect this behaviour, normally nothing happens or the exact matched value is replaced, I wouldn't use `replace` to `ffill` or `bfill` as a consequence – EdChum Jan 03 '19 at 11:56