Replacing NaN with null python pandas

Question

I am trying to write from a pandas dataframe to AWS redshift:

df_tmp_rpt = pd.read_csv('path')
df_tmp_rpt = df_tmp_rpt[df_tmp_rpt['COL'] == 'VALUE']
df_tmp_rpt = df_tmp_rpt.replace(np.nan, null, regex=True)
records = df_tmp_rpt.to_records(index=False)
for record in records:
     script_insert = ScriptReader.get_script(SCRIPT_PATH).format(record)
     RedshiftDataManager.run_update(script_insert, DB_CONNECTION)

Redshift expects the format ('value1','value2',null) for inserting data. That is why i try to replace all NaN with null in the dataframe. How would I achieve such thing? (I need a null value not the string 'null')

Thanks for help in advance

Does this answer your question? [Pandas dataframe fillna() only some columns in place](https://stackoverflow.com/questions/38134012/pandas-dataframe-fillna-only-some-columns-in-place) — sushanth, May 27 '20 at 13:47
@Sushanth I saw the topic indeed but it does not. I need all the NaN values replaced with null, not with 'None' or '0' — PyPy2000, May 27 '20 at 13:51

Himanshu Punjabi · Answer 1 · 2021-03-19T16:17:23.283

3

This is what worked for me.

df_tmp_rpt = df_tmp_rpt.where(df_tmp_rpt.notna(), None)

This will replace all the NaN values in your Dataframe to None. None is loaded as NULL in the Database. This works in MS SQL.

edited Mar 19 '21 at 16:17

answered Mar 18 '21 at 14:20

Himanshu Punjabi

41
5

dkreeft · Answer 2 · 2020-05-27T18:04:39.863

1

There is no null in Python. In AWS Redshift, a null is when a value is missing or unknown. Replacing NaN with an empty string might thus work. Consider using df_tmp_rpt.fillna(value=[None]) instead of using replace().

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas-dataframe-fillna

edited May 27 '20 at 18:04

answered May 27 '20 at 14:24

dkreeft

642
5
17

Thanks for clarifying. That means I would have to find a different solution for my problem since the integer fields expect null instead of an empty string – PyPy2000 May 27 '20 at 14:28
I believe passing an empty string is interpreted by Redshift as null. – dkreeft May 27 '20 at 14:33
That is what I thought too. After that I get an error saying that '' cant be entered into an integer field. Error is: invalid input syntax for integer: "" – PyPy2000 May 27 '20 at 14:37
I see, Amazon's documentation is not consistent. [Here](https://docs.aws.amazon.com/redshift/latest/dg/r_Nulls.html) I read that indeed '' is not equal to null. I will update my answer with `None` instead of '', I believe that could work. – dkreeft May 27 '20 at 14:50
Excuse me I should have provided you with that piece of documentation. Where should I change the NaN values to None in my code? Running: df_tmp_rpt.replace(np.nan, None, inplace=True) returns the same thing as when I do not replace anything – PyPy2000 May 27 '20 at 14:57
Try turning `None` into a list, see: https://stackoverflow.com/questions/17097236/replace-invalid-values-with-none-in-pandas-dataframe – dkreeft May 27 '20 at 18:05
That does indeed change it to None. However, Redshift still does not recognize None as a valid datatype since it needs to be null – PyPy2000 May 28 '20 at 09:54

Replacing NaN with null python pandas

2 Answers2