Replacing Pandas or Numpy Nan with a None to use with MysqlDB

Question

I am trying to write a Pandas dataframe (or can use a numpy array) to a mysql database using MysqlDB . MysqlDB doesn't seem understand 'nan' and my database throws out an error saying nan is not in the field list. I need to find a way to convert the 'nan' into a NoneType.

Any ideas?

Is there no setting you can change in Pandas to make it return `None` for `NULL` instead of `nan`? — Nathan Hinchey, Apr 12 '17 at 15:27

score 329 · Accepted Answer · edited Jan 26 '20 at 17:40

329

@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]: 
    0
0   1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]: 
      0
0     1
1  None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.

As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).

edited Jan 26 '20 at 17:40

EliadL

6,230
2
26
43

answered Jan 04 '13 at 19:01

Andy Hayden

359,921
101
625
535

see docs http://pandas.pydata.org/pandas-docs/stable/indexing.html#where-and-masking – Jeff Jan 05 '13 at 01:52
1

FWIW..this will also change the dtype of the columns to object, you probably don't care though – Jeff Jan 05 '13 at 02:03
@Jeff Thanks for the link, weirdly I couldn't find it earlier! I figured it had to change the dtype to allow None, definitely worth mentioning! – Andy Hayden Jan 05 '13 at 12:22
useful to use before inserting with Django to avoid the `np.nan` being converted to string `"nan"` – Shadi May 11 '18 at 08:04
Useful caveat. Makes sense to loop through only those columns that are already `dtype` of `object` and do it for those and handle other types differently as needed. Ideally, `fillna(None)` would be terrific. – Vishal Sep 30 '18 at 03:17
2

An important use case is when converting to JSON. Not all languages support NaNs in JSON (such as PHP), so they need to be converted to None. This is something I've run into quite a bit as a data scientist. – bpachev Jun 13 '20 at 01:34
1

Is this method still working? Currently only @EliadL 's answer below worked without errors for me, at least in pandas version `1.0.3` . – petobens Aug 10 '20 at 04:44
29

Using `df.where(pd.notnull(df), None)` no longer works in 1.3.0 - instead I found the next answer from @EliadL to still work fine: https://stackoverflow.com/a/54403705/2407819 – Sebastian Hätälä Jul 07 '21 at 15:47
This only works for certain data types, example: it changes NaN to None in string columns but not for Float type, that are not modified. Going straight to numpy as said above worked for me `df = df.replace({np.nan: None})` – Alejo Garcia Bondarenko Aug 10 '22 at 19:02
This works for me. It also preserves my column dtypes for columns that don't have NaNs in them (I'm using pandas 1.2.4) – Eddy Sep 07 '22 at 09:15
Please update this code with ```replace(np.NaN, None)``` as the example code no longer works after 1.3.0 see https://github.com/pandas-dev/pandas/issues/42423 – Dave Lawrence Dec 12 '22 at 11:11

EliadL · Answer 2 · 2022-11-15T10:48:14.150

315

df = df.replace({np.nan: None})

Note: For pandas versions <1.4, this changes the dtype of all affected columns to object.
To avoid that, use this syntax instead:

df = df.replace(np.nan, None)

Credit goes to this guy here on this Github issue and Killian Huyghe's comment.

edited Nov 15 '22 at 10:48

answered Jan 28 '19 at 14:07

EliadL

6,230
2
26
43

10

this is the best answer as you can use `df.replace({np.nan: None})` as a temp object – Matt Jun 19 '20 at 12:32
8

if the values in `df` are already `None` this answer will toggle them back to `np.nan` – Max Segal Jun 06 '21 at 11:44
@MaxSegal How is that? I haven't found this in `replace()` documentation. Can you refer me to where this is mentioned in the docs? – Ammar Alyousfi Jul 25 '21 at 12:50
@AmmarAlyousfi `to = {np.nan: None}; assert df.replace(to).replace(to).equals(df)` – EliadL Jul 25 '21 at 14:53
it does not toggle them back for me, nor does the documentation indicate it would. – cfelix Dec 21 '21 at 17:12
I see the save behavior as @MaxSegal. `np.nan` is transformed into `None`, while `None` is transformed into `np.nan` – swimmer Jan 24 '22 at 18:21
6

**for pandas versions <1.3.0** if the values in `df` are already `None` this answer will toggle them back to `np.nan` – Max Segal Jan 26 '22 at 11:33
1

I had the issue of them being toggled back on version >1.3.0. My column was categorical. When I switched it to object it worked again. Perhaps that may be the cause. – hawkar Jan 27 '22 at 15:02
3

For pandas versions <1.4, there is a bug when using a dict in replace and your column dtypes may change unexpectedly, you should prefer this syntax instead: `df = df.replace(np.nan, None)`. See https://github.com/pandas-dev/pandas/issues/35268 – Killian Huyghe Nov 15 '22 at 02:13

score 28 · Answer 3 · answered Jan 04 '13 at 18:57

28

You can replace nan with None in your numpy array:

>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>

answered Jan 04 '13 at 18:57

bogatron

18,639
6
53
47

2

The only potential concern is the change of `dtype`, `x.dtype` is `dtype('float64')` ,while `y.dtype` is `dtype('object')`. – Jaime Jan 05 '13 at 04:23

score 25 · Answer 4 · answered Aug 02 '17 at 19:47

25

After stumbling around, this worked for me:

df = df.astype(object).where(pd.notnull(df),None)

answered Aug 02 '17 at 19:47

rodney cox

251
3
4

1

This seems to required on newer versions of pandas. The `where` and `replace` methods both get converted back to `NaN` when applied to a `pd.Categorical` column – camraynor Feb 28 '22 at 23:27

Max Segal · Answer 5 · 2023-07-12T13:37:45.507

10

replace np.nan with None is accomplished differently across different version of pandas:

if version.parse(pd.__version__) >= version.parse('1.3.0'):
    df = df.replace({np.nan: None})
else:
    df = df.where(pd.notnull(df), None)

this solves the issue that for pandas versions <1.3.0, if the values in df are already None then df.replace({np.nan: None}) will toggle them back to np.nan and vice versa.

edited Jul 12 '23 at 13:37

answered Jan 26 '22 at 11:37

Max Segal

1,955
1
24
53

score 9 · Answer 6 · answered Oct 10 '19 at 13:35

9

Another addition: be careful when replacing multiples and converting the type of the column back from object to float. If you want to be certain that your None's won't flip back to np.NaN's apply @andy-hayden's suggestion with using pd.where. Illustration of how replace can still go 'wrong':

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})

In [4]: df
Out[4]:
     a
0  1.0
1  NaN
2  inf

In [5]: df.replace({np.NAN: None})
Out[5]:
      a
0     1
1  None
2   inf

In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
     a
0  1.0
1  NaN
2  NaN

In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
     a
0  1.0
1  NaN
2  NaN

answered Oct 10 '19 at 13:35

gaatjeniksaan

1,412
2
12
17

Thanks for adding this. Going over the documentation again, I still can't understand this behavior. Anyway, this can be worked around by chaining yet another `.replace({np.nan: None})` – EliadL Dec 02 '19 at 09:02
1

Yes, you could finish by adding another `replace({np.nan: None})`. My comment was added to point out the potential pitfall when replacing `np.nan`'s. The above certainly tripped me out for a bit! – gaatjeniksaan Dec 03 '19 at 14:15

score 8 · Answer 7 · answered Apr 29 '19 at 04:21

Just an addition to @Andy Hayden's answer:

Since DataFrame.mask is the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:

DataFrame.where is useful for Replacing values where the condition is False.
DataFrame.mask is used for Replacing values where the condition is True.

So in this question, using df.mask(df.isna(), other=None, inplace=True) might be more intuitive.

score 4 · Answer 8 · answered Nov 09 '16 at 14:48

4

Quite old, yet I stumbled upon the very same issue. Try doing this:

df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)

answered Nov 09 '16 at 14:48

redacted

3,789
6
25
38

1

doesn't work if column data type is numeric because None just gets converted back into nan (pandas 0.23) – Shadi Nov 22 '18 at 07:59

score 2 · Answer 9 · answered Jul 13 '21 at 12:51

I believe the cleanest way would be to make use of the na_value argument in the pandas.DataFrame.to_numpy() method (docs):

na_value : Any, optional

The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

New in version 1.1.0.

You could e.g. convert to dictionaries with NaN's replaced by None using

columns = df.columns.tolist()
dicts_with_nan_replaced = [
    dict(zip(columns, x))
    for x in df.to_numpy(na_value=None)
]

Your code keeps NaN as NaN, but you can fix it if you also pass `dtype=object`. — EliadL, Sep 30 '21 at 07:07

score 2 · Answer 10 · answered Jan 30 '23 at 16:05

2

Sometimes it is better to use this code. Note that np refers to the numpy:

df = df.fillna(np.nan).replace([np.nan], [None])

answered Jan 30 '23 at 16:05

Hesam Nikzad Jamnani

23
7

Why is `.fillna(np.nan)` needed here? – EliadL Jan 31 '23 at 12:49
In my case replace doesn't work without it. It seems it's needed to convert na to a numpy object first, then use it in replace method. – Hesam Nikzad Jamnani Feb 01 '23 at 13:08
By "na" which value are you referring to, exactly? – EliadL Feb 01 '23 at 13:59
I meant Pandas NaN – Hesam Nikzad Jamnani Feb 08 '23 at 09:16

score 1 · Answer 11 · answered Dec 01 '21 at 07:07

1

Convert numpy NaN to pandas NA before replacing with the where statement:

df = df.replace(np.NaN, pd.NA).where(df.notnull(), None)

answered Dec 01 '21 at 07:07

Jumikru

21
2

score 1 · Answer 12 · answered May 12 '22 at 13:43

1

Astoundingly, None of the previous answers worked for me, so I had to do it for each column.

for column in df.columns:
            df[column] = df[column].where(pd.notnull(df[column]), None)

answered May 12 '22 at 13:43

Berel Levy

61
4

It would be useful if you can explain why the other answers did not work and how this one helps. – Yuvraj Jaiswal May 17 '22 at 10:40
@YuvrajJaiswal I don't know why it didn't work, likewise I don't know exactly why my version works lol. I suppose series.where is more straight forward. – Berel Levy May 17 '22 at 20:28

score 0 · Answer 13 · answered Oct 05 '21 at 19:52

Do you have a code block to review by chance?

Using .loc, pandas can access records based on logic conditions (filtering) and do action with them (when using =). Setting a .loc mask equal to some value will change the return array inplace (so be a touch careful here; I suggest test on a df copy prior to using in code block).

df.loc[df['SomeColumn'].isna(), 'SomeColumn'] = None

The outer function is df.loc[row_label, column_label] = None. We're going to use a boolean mask for row_label by using the .isna() method to find 'NoneType' values in our column SomeColumn.

We'll use the .isna() method to return a boolean array of rows/records in column SomeColumn as our row_label: df['SomeColumn'].isna(). It will isolate all rows where SomeColumn has any of the 'NoneType' items pandas checks for with the .isna() method.

We'll use the column_label both when masking the dataframe for the row_label, and to identify the column we want to act on for the .loc mask.

Finally, we set the .loc mask equal to None, so the rows/records returned are changed to None based on the masked index.

Below are links to pandas documentation regarding .loc & .isna().

References:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isna.html

score 0 · Answer 14 · answered Dec 02 '21 at 20:16

0

After finding that neither the recommended answer, nor the alternate suggested worked for my application after a Pandas update to 1.3.2 I settled for safety with a brute force approach:

buf = df.to_json(orient='records')
recs = json.loads(buf)

answered Dec 02 '21 at 20:16

Kevin

31
3

score 0 · Answer 15 · answered Mar 04 '22 at 15:35

0

Yet another option, that actually did the trick for me:

df = df.astype(object).replace(np.nan, None)

answered Mar 04 '22 at 15:35

SLuck

521
3
14

score 0 · Answer 16 · answered May 30 '22 at 16:16

Doing it by hand is the only way that is working for me right now.

This answare from @rodney cox worked for me in almost every case.

The following code set all columns to object data type and then replace any null value to None. Setting the column data type to object is crucial because it prevents pandas to change the type further.

for col in df.columns:
    df[col] = df[col].astype(object)
    df.loc[df[col].isnull(), col] = None

Warning: This solution is not eficient, because it process columns that might not have np.nan values.

score 0 · Answer 17 · answered Mar 28 '23 at 23:18

0

This should work: df["column"]=df["column"].apply(lambda x: None if pd.isnull(x) else x)

answered Mar 28 '23 at 23:18

Marcelo Brisac

1

score -3 · Answer 18 · answered Nov 25 '21 at 13:54

-3

This worked for me:

df = df.fillna(0)

answered Nov 25 '21 at 13:54

wfolkerts

99
1
4

Replacing Pandas or Numpy Nan with a None to use with MysqlDB

18 Answers18

Linked

Related