2

For me, the following snippet leaves the NaN value as NaN:

import pandas
a = [12, 23]
b = [123, None]
c = [1234, 2345]
d = [12345, 23456]
tuples = [('eyes', 'left'), ('eyes', 'right'), ('ears', 'left'), ('ears', 'right')]
events = {('eyes', 'left'): a, ('eyes', 'right'): b, ('ears', 'left'): c, ('ears', 'right'): d}
multiind = pandas.MultiIndex.from_tuples(tuples, names=['part', 'side'])
zed = pandas.DataFrame(events, index=['a', 'b'], columns=multiind)
zed['eyes']['right'].fillna(value=555, inplace=True)

I get:

part  eyes         ears       
side  left  right  left  right
a       12    123  1234  12345
b       23    NaN  2345  23456

If I run this with inplace set to False, the returned Series has replaced NaN with 555. I could use this work-around, but on the one hand, if it's a bug I want to report it, and on the other hand, even the work-around doesn't work for my actual application.

So the question is whether I misunderstand fillna() or this is a bug. Thanks!

Edit: I'm using pandas 0.12.0, numpy 1.8.0, and python 2.7.5 on openSUSE 13.1.

crantila
  • 23
  • 3

2 Answers2

2

I would use update here since it's more explicit... and avoids the whole updating a copy thing.

First select the subframe where the column is (eyes, right):

In [11]: zed.loc[:, [('eyes', 'right')]]
Out[11]: 
part   eyes
side  right
a       123
b       NaN    
[2 rows x 1 columns]

Fill in the NaN with 555, and update:

In [12]: zed.loc[:, [('eyes', 'right')]].fillna(555)
Out[12]: 
part   eyes
side  right
a       123
b       555
[2 rows x 1 columns]

In [13]: zed.update(zed.loc[:, [('eyes', 'right')]].fillna(555))

In [14]: zed
Out[14]: 
part  eyes         ears       
side  left  right  left  right
a       12    123  1234  12345
b       23    555  2345  23456
[2 rows x 4 columns]

Similar to chaining in an assignment:

zed['eyes']['right'].fillna(value=555, inplace=True)
zed.loc[:,[('eyes', 'right')]].fillna(value=555, inplace=True)

may sometimes work but don't count on it (@Jeff suggests it may work if all columns are floats!), it's likely you'll end up modifying a copy and not the original frame.

Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Here's a PR to show a warning when you try to do this like the OP. https://github.com/pydata/pandas/pull/5679. This can also be done (in this case), by ``zed.loc[:[('eyes','right')].fillna(555,inplace=True)``. I say in this case because you have a single dtyped frames (meaning all dtypes are floats), @Andy Hayden soln is best as IMHO very clear what you are doing. – Jeff Dec 11 '13 at 14:04
  • @Jeff interestingly thought I tested that and it didn't work for me! Explicit def correct choice here – Andy Hayden Dec 11 '13 at 19:00
  • So the answer to my question: I misunderstood `fillna()`. Thanks for your detailed explanation of how you built the correct version. I'd seen all of these things before, and thought "okay, whatever," but now it really makes sense why they exist. And a new warning to help prevent future-me from doing this again? Such service, all the responsive developers, wow! – crantila Dec 11 '13 at 22:09
  • @crantila Glad you got it working! I think the thing to take away is that "inplace chaining" (more that one operation in the same line), even though it *sometimes* works, is best to be avoided. E.g. [chaining an assignment](http://stackoverflow.com/questions/19867734/changing-certain-values-in-multiple-columns-of-a-pandas-dataframe-at-once/19867768#19867768). – Andy Hayden Dec 11 '13 at 23:06
0

pandas.fillna() is mean to replace NaN values with something else, not insert NaN into null data slots. See this example for details:

In [23]: df2

        one       two     three four   five           timestamp
a       NaN  1.138469 -2.400634  bar   True                 NaT
c       NaN  0.025653 -1.386071  bar  False                 NaT
e  0.863937  0.252462  1.500571  bar   True 2012-01-01 00:00:00
f  1.053202 -2.338595 -0.374279  bar   True 2012-01-01 00:00:00
h       NaN -1.157886 -0.551865  bar  False                 NaT
[5 rows x 6 columns]

In [24]: df2.fillna(0)

        one       two     three four   five           timestamp
a  0.000000  1.138469 -2.400634  bar   True 1970-01-01 00:00:00
c  0.000000  0.025653 -1.386071  bar  False 1970-01-01 00:00:00
e  0.863937  0.252462  1.500571  bar   True 2012-01-01 00:00:00
f  1.053202 -2.338595 -0.374279  bar   True 2012-01-01 00:00:00
h  0.000000 -1.157886 -0.551865  bar  False 1970-01-01 00:00:00
[5 rows x 6 columns]

Note that NaT is Not a Time.

MattDMo
  • 100,794
  • 21
  • 241
  • 231