Pandas Replace NaN with blank/empty string

Question

I have a Pandas Dataframe as shown below:

    1    2       3
 0  a  NaN    read
 1  b    l  unread
 2  c  NaN    read

I want to remove the NaN values with an empty string so that it looks like so:

    1    2       3
 0  a   ""    read
 1  b    l  unread
 2  c   ""    read

fantabolous · Answer 1 · 2023-05-10T08:28:06.700

625

df = df.fillna('')

This will fill na's (e.g. NaN's) with ''.

inplace is possible but should be avoided as it makes a copy internally anyway, and it will be deprecated:

df.fillna('', inplace=True)

To fill only a single column:

df.column1 = df.column1.fillna('')

One can use df['column1'] instead of df.column1.

edited May 10 '23 at 08:28

answered Feb 08 '15 at 05:44

fantabolous

21,470
7
54
51

15

@Mithril - `df[['column1','column2']] = df[['column1','column2']].fillna('')` – elPastor Oct 12 '17 at 01:29
2

This is giving me `SettingWithCopyWarning` – jss367 Nov 11 '20 at 22:44
4

@jss367 That's not due to this code, but rather because you've earlier created a partial view of a larger df. Very good answer here https://stackoverflow.com/a/53954986/3427777 – fantabolous Jan 26 '21 at 11:54
I'm curious as to why `str(np.nan)` doesn't return an empty string, which would seem to me to be the logical result. I'm sure it has something to do with the inner workings of the sausage factory. Can anyone point me to a good explanation? – JJL Jun 24 '21 at 22:14

score 395 · Accepted Answer · edited Mar 08 '17 at 14:05

395

import numpy as np
df1 = df.replace(np.nan, '', regex=True)

This might help. It will replace all NaNs with an empty string.

edited Mar 08 '17 at 14:05

Ninjakannon

3,751
7
53
76

answered Nov 10 '14 at 06:40

nEO

5,305
3
21
25

1

what library does `np.nan` come from? I can't use it – CaffeineConnoisseur Aug 05 '16 at 22:33
11

@CaffeineConnoisseur: `import numpy as np`. – John Zwinck Aug 08 '16 at 21:56
52

@CaffeineConnoisseur - or just `pd.np.nan` if you don't want to `import numpy` as well. – elPastor Oct 12 '17 at 01:27
1

This also allows the Dict to be saved as a string in the row of a .csv and then subsequently read back into a DataFrame using the `pd.DataFrame.from_dict(eval(_string_))` – yeliabsalohcin Aug 07 '18 at 11:02
8

Also useful to mention the `... inplace=True` option. – smci May 24 '19 at 23:02
3

@CaffeineConnoisseur,@elPastor - `pandas 1.0.3` warns of `pandas.np` deprecation in future versions. It was nice having it! – Gathide May 05 '20 at 13:11
You can use `float('nan')` instead of `np.nan`. – Asclepius May 15 '20 at 00:01
3

You can also use `pd.NA` instead of `pd.np.nan` since 1.0.0 – lucidyan Mar 10 '21 at 15:58

Natesh bhat · Answer 3 · 2022-01-15T10:24:51.217

168

If you are reading the dataframe from a file (say CSV or Excel) then use :

df.read_csv(path , na_filter=False)

df.read_excel(path , na_filter=False)

This will automatically consider the empty fields as empty strings ''

If you already have the dataframe

df = df.replace(np.nan, '', regex=True)

df = df.fillna('')

edited Jan 15 '22 at 10:24

answered Jul 19 '17 at 15:16

Natesh bhat

12,274
10
84
125

na_filter is not available on read_excel() http://pandas.pydata.org/pandas-docs/stable/search.html?q=na_filter&check_keywords=yes&area=default – Marjorie Roswell Jul 31 '17 at 02:39
i have used it in my application . It does exist but for some reason , they haven't given this argument in the docs . It works nice for me though without errors. – Natesh bhat Aug 01 '17 at 06:40
It works, i'm using it in parse `xl.parse('sheet_name', na_filter=False)` – Dmitrii Nov 22 '17 at 17:33
I trawled through so many different threads for a fix and this is the only one that worked for my CSV file. Thanks. – Deskjokey Jan 09 '22 at 09:52

score 10 · Answer 4 · edited May 24 '19 at 23:29

10

Use a formatter, if you only want to format it so that it renders nicely when printed. Just use the df.to_string(... formatters to define custom string-formatting, without needlessly modifying your DataFrame or wasting memory:

df = pd.DataFrame({
    'A': ['a', 'b', 'c'],
    'B': [np.nan, 1, np.nan],
    'C': ['read', 'unread', 'read']})
print df.to_string(
    formatters={'B': lambda x: '' if pd.isnull(x) else '{:.0f}'.format(x)})

To get:

   A B       C
0  a      read
1  b 1  unread
2  c      read

edited May 24 '19 at 23:29

smci

32,567
20
113
146

answered Jun 21 '18 at 22:41

Steve Schulist

931
1
11
18

4

`print df.fillna('')` by itself (without doing `df = df.fillna('')`) doesn't modify the original either. Is there a speed or other advantage to using `to_string`? – fantabolous Nov 27 '18 at 03:10
Fair enough, `df.fillna('')` it is! – Steve Schulist Nov 28 '18 at 15:35
@shadowtalker: Not necessarily, it would only be the correct answer if the OP wanted to keep the df in one format (e.g. more computationally-efficient, or saving memory on unnecessary/empty/duplicate strings), yet render it visually in a more pleasing way. Without knowing more about the use-case, we can't say for sure. – smci May 24 '19 at 23:05

Vineesh TP · Answer 5 · 2021-04-30T06:21:37.340

7

Try this,

add inplace=True

import numpy as np
df.replace(np.NaN, '', inplace=True)

edited Apr 30 '21 at 06:21

answered Aug 23 '19 at 12:27

Vineesh TP

7,755
12
66
130

This is not an empty string, `''` and `' '` are not equivalent, While the first is treated as `False`, the value used above will be treated as `True`. – suvayu Apr 28 '21 at 09:26

score 4 · Answer 6 · answered Jun 28 '19 at 09:29

4

using keep_default_na=False should help you:

df = pd.read_csv(filename, keep_default_na=False)

answered Jun 28 '19 at 09:29

Bendy Latortue

391
5
6

score 0 · Answer 7 · edited May 17 '19 at 11:11

0

If you are converting DataFrame to JSON, NaN will give error so best solution is in this use case is to replace NaN with None.
Here is how:

df1 = df.where((pd.notnull(df)), None)

edited May 17 '19 at 11:11

taras

6,566
10
39
50

answered Mar 15 '18 at 20:48

Dinesh Khetarpal

363
3
5

score 0 · Answer 8 · answered Jul 04 '19 at 04:07

I tried with one column of string values with nan.

To remove the nan and fill the empty string:

df.columnname.replace(np.nan,'',regex = True)

To remove the nan and fill some values:

df.columnname.replace(np.nan,'value',regex = True)

I tried df.iloc also. but it needs the index of the column. so you need to look into the table again. simply the above method reduced one step.

Pandas Replace NaN with blank/empty string

8 Answers8

Linked