51

I have the following dictionary:

fillna(value={'first_name':'Andrii', 'last_name':'Furmanets', 'created_at':None})

When I pass that dictionary to fillna I see:

raise ValueError('must specify a fill method or value')\nValueError: must specify a fill method or value\n"

It seems to me that it fails on None value.

I use pandas version 0.20.3.

Andrii Furmanets
  • 1,081
  • 2
  • 12
  • 29
  • 2
    in a float column, None is de-facto represented by np.nan (and in most types). So this doesn't make any sense. see the docs here from https://github.com/pandas-dev/pandas/issues/10871 – BENY Sep 18 '17 at 16:00
  • `d= {'first_name': 'Andrii', 'last_name':'Furmanets'}` – BENY Sep 18 '17 at 16:05
  • The dictionary comes from outside and must go through the pandas, there is created_at field, it seems to me that it worked with pandas 0.18.0. – Andrii Furmanets Sep 18 '17 at 16:08

5 Answers5

86

In case you want to normalize all of the nulls with python's None.

df.fillna(np.nan).replace([np.nan], [None])

The first fillna will replace all of (None, NAT, np.nan, etc) with Numpy's NaN, then replace Numpy's NaN with python's None.

AsaridBeck91
  • 1,276
  • 9
  • 12
  • 5
    To me, this was the most simple way to apply None to entire dataframe. – su79eu7k Sep 09 '20 at 09:06
  • 6
    `df.replace([np.nan], [None])` this is sufficient – mangusta Jul 21 '21 at 21:55
  • 1
    @mangusta For most cases you're right, but If you have other types of null (e.g. pd.NaT) you won't necessarily get python's None after `replace`. Starting with `fillna` is more consistent. – AsaridBeck91 Jul 22 '21 at 11:48
  • 1
    Can someone explain why this works? In my case, I was using only `.replace({np.nan: None})` and not `.fillna()`. I wanted to turn all nan values to None but sometimes I had perfect DF where there were no nan, only some None values, but `.replace()` turned all None to nan when it was supposed to do the opposite? – rain01 Aug 06 '21 at 17:00
  • Very useful when using xlwings, which doesn't support `NA`s, but supports `None`s. – Ronan Paixão Jun 15 '22 at 16:39
  • 2
    Why are the list brackets necessary? – michen00 Sep 21 '22 at 04:57
  • For my own specific case, I used this two-steps way following the answer: `df = df.fillna('').replace([''], [None])` and it is okay for me because I also want the empty strings to be None. – Domenico Spidy Tamburro Nov 03 '22 at 08:28
  • @michen00 I believe the brackets are necessary because `None` passed as `value` is not interpreted as value to actually set in the df but as "default" and the default must be to set to `np.nan`, which leads to a recursion issue: NaN replaced with Nan replaced with Nan,... – Jérôme Nov 24 '22 at 09:17
23

Setup
Consider the sample dataframe df

df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))

df

     A    B     C
0  1.0  NaN  None
1  NaN  2.0     D

I can confirm the error

df.fillna(dict(A=1, B=None, C=4))
ValueError: must specify a fill method or value

This happens because pandas is cycling through keys in the dictionary and executing a fillna for each relevant column. If you look at the signature of the pd.Series.fillna method

Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

You'll see the default value is None. So we can replicate this error with

df.A.fillna(None)

Or equivalently

df.A.fillna()

I'll add that I'm not terribly surprised considering that you are attempting to fill a null value with a null value.


What you need is a work around

Solution
Use pd.DataFrame.fillna over columns that you want to fill with non-null values. Then follow that up with a pd.DataFrame.replace on the specific columns you want to swap one null value with another.

df.fillna(dict(A=1, C=2)).replace(dict(B={np.nan: None}))

     A     B  C
0  1.0  None  2
1  1.0     2  D
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

What type of data structure are you using? This works for a pandas Series:

import pandas as pd

d = pd.Series({'first_name': 'Andrii', 'last_name':'Furmanets', 'created_at':None})
d = d.fillna('DATE')
atwalsh
  • 3,622
  • 1
  • 19
  • 38
2

An alternative method to fillna with None. I am on pandas 0.24.0 and I am doing this to insert NULL values to POSTGRES database.

# Stealing @pIRSquared dataframe
df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))

df

     A    B     C
0  1.0  NaN  None
1  NaN  2.0     D

# fill NaN with None. Basically it says, fill with None whenever you see NULL value.
df['A'] = np.where(df['A'].isnull(), None, df['A'])
df['B'] = np.where(df['B'].isnull(), None, df['B'])

# Result
df

     A    B     C
0  1.0  None  None
1  None  2.0     D

addicted
  • 2,901
  • 3
  • 28
  • 49
  • `IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices` Unfortunately, I can't give the example because they are very large dataframes with sensitive data. Any idea what could be going on? – user1717828 Nov 19 '19 at 03:34
-2

Solution: use pandas pd.NA not base Python None

df = pd.DataFrame({'first_name':pd.NA, 'last_name':pd.NA, 'created_at':pd.NA})

df.fillna(value={'first_name':'Andrii', 'last_name':'Furmanets', 'created_at':pd.NA})

Generally it's better to leave pandas NA as-is. Do not try to change it. The presence of NA is a feature, not an issue. NA gets handled correctly in other pandas functions (but not numpy)

  • If you insist that python None should replace pandas NA's for some downstream reason, show us the missing code that follows where NA is causing an issue; that's usually an XY problem.
smci
  • 32,567
  • 20
  • 113
  • 146
  • @HenryHenrinson: this absolutely is an answer (leave the NaT as is, do not replace it), is [recommended by the pandas documentation](https://pandas.pydata.org/docs/user_guide/missing_data.html#datetimes) and in most situations avoids problems down the line, pandas functions are NaT-aware. The OP hasn't shown any downstream code where NaT is actually causing an issue. I edited the answer to add that explanation. – smci Jul 12 '23 at 01:00
  • @HenryHenrinson: then NA, not NaT. This is very much still the right answer: NaN's are generally your friend in pandas and work in aggregations, joins etc. The OP still hasn't shown any downstream code where NA is actually causing an issue, hence this is an XY question: just bwcause they insist they need to fillna, doesn't mean they need to. – smci Jul 13 '23 at 03:38
  • @HenryHenrinson: NA, NaN (and NaT) are fully pandas-compatible values; whereas base Python `None` isn't. That's the bottom line and what I've been consistently saying for 4 years. Look at the pandas doc, they don't recommend using `None` as idiomatic pandas. – smci Jul 13 '23 at 05:18