How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

Question

I have a Pandas Dataframe as below:

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

When I try to apply a function to the Amount column, I get the following error:

ValueError: cannot convert float NaN to integer

I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function. I have also looked at this article How do I replace NA values with zeros in an R dataframe? whilst looking at some other articles. All the methods I have tried have not worked or do not recognise NaN. Any Hints or solutions would be appreciated.

The only problem is df.fill.na() does not work if the data frame on which you are applying it is resampled or have been sliced through loc function — Prince Agarwal, Jun 11 '18 at 08:47

score 972 · Accepted Answer · edited Jun 06 '20 at 01:08

972

I believe DataFrame.fillna() will do this for you.

Link to Docs for a dataframe and for a Series.

Example:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

EDIT:

To avoid a SettingWithCopyWarning, use the built in column-specific functionality:

df.fillna({1:0}, inplace=True)

edited Jun 06 '20 at 01:08

johnDanger

1,990
16
22

answered Nov 08 '12 at 18:54

Aman

45,819
7
35
37

1

Is it guaranteed that `df[1]` is a view rather than a copy of the original DF? Obviously, if there's a rare situation where it's a copy, it would cause a super-troublesome bug. Is there a clear statement on that in pandas documentation? – max Jan 30 '16 at 11:53
@max See this, might address your question: http://stackoverflow.com/questions/23296282/what-rules-does-pandas-use-to-generate-a-view-vs-a-copy – Aman Feb 03 '16 at 01:23
Thanks. Is my understanding correct that in that answer an "indexer that sets" is the outermost indexing operation (executed just before the assignment. So any assignment that only uses a single indexer is guaranteed to be safe, making your code safe? – max Feb 03 '16 at 16:01
@max I do not know what you mean by "safe"... but in any case, this seems off topic here. :) Probably best to comment on that other question, or post a new question. – Aman Feb 03 '16 at 18:51
More info for fillna() can be found here http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html – prashanth Jun 03 '16 at 10:15
1

Why is this not working for me? see: http://stackoverflow.com/questions/39452095/how-to-fillna-with-value-0 – Stefan Falk Sep 12 '16 at 13:59
What's the equivalent for a list? – FaCoffee Nov 07 '16 at 20:19
1

the last example throws a SettingWithCopyWarning – Sip Jan 18 '19 at 14:57
1

@Farrukh Faizy's answer is really the way `fillna()` is designed to handle specific columns. Perhaps replace the second solution with his as your answer will always be on top?? :) – johnDanger Jun 05 '20 at 21:24

score 191 · Answer 2 · edited Oct 07 '18 at 19:25

191

It is not guaranteed that the slicing returns a view or a copy. You can do

df['column'] = df['column'].fillna(value)

edited Oct 07 '18 at 19:25

Asclepius

57,944
17
167
143

answered Oct 06 '16 at 09:10

rakesh

4,368
1
19
13

17

Just discovered the "inplace=True" problem. This answer avoids the issue and I think is the cleanest solution presented. – TimCera Apr 28 '17 at 13:53

score 66 · Answer 3 · answered Jun 15 '17 at 05:11

66

You could use replace to change NaN to 0:

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

answered Jun 15 '17 at 05:11

Anton Protopopov

30,354
12
88
93

1

Will it only replace `NaN` ? or it will also replace value where `NA` or `NaN` like `df.fillna(0)`? I am looking for solution which only replace value where there is `NaN` and not `NA` – Shyam Bhimani Jan 09 '20 at 16:50
1

@ShyamBhimani it should replace only `NaN` i.e. values where `np.isnan` is True – Anton Protopopov Jan 10 '20 at 15:21

score 31 · Answer 4 · edited Sep 13 '16 at 21:13

31

The below code worked for me.

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

edited Sep 13 '16 at 21:13

Petter Friberg

21,252
9
60
109

answered Sep 13 '16 at 20:59

Cornel Ciobanu

552
7
15

One liner `df = pandas.read_csv('somefile.txt').fillna(0)` – Siraj Alam May 27 '22 at 14:40

score 29 · Answer 5 · edited Dec 16 '15 at 18:29

I just wanted to provide a bit of an update/special case since it looks like people still come here. If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

The solution is DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

score 14 · Answer 6 · answered Jul 16 '19 at 20:02

You can also use dictionaries to fill NaN values of the specific columns in the DataFrame rather to fill all the DF with some oneValue.

import pandas as pd

df = pd.read_excel('example.xlsx')
df.fillna( {
        'column1': 'Write your values here',
        'column2': 'Write your values here',
        'column3': 'Write your values here',
        'column4': 'Write your values here',
        .
        .
        .
        'column-n': 'Write your values here'} , inplace=True)

This is the developer-intended solution to the OP's question. — johnDanger, Jun 05 '20 at 21:26

score 11 · Answer 7 · edited Jul 07 '18 at 19:03

Easy way to fill the missing values:-

filling string columns: when string columns have missing values and NaN values.

df['string column name'].fillna(df['string column name'].mode().values[0], inplace = True)

filling numeric columns: when the numeric columns have missing values and NaN values.

df['numeric column name'].fillna(df['numeric column name'].mean(), inplace = True)

filling NaN with zero:

df['column name'].fillna(0, inplace = True)

Herker · Answer 8 · 2023-03-14T09:26:43.543

9

Replace all nan with 0

df = df.fillna(0)

edited Mar 14 '23 at 09:26

answered Mar 01 '21 at 14:22

Herker

552
6
20

score 9 · Answer 9 · answered Jan 11 '22 at 06:59

There have been many contributions already, but since I'm new here, I will still give input.

There are two approaches to replace NaN values with zeros in Pandas DataFrame:

fillna(): function fills NA/NaN values using the specified method.
replace(): df.replace()a simple method used to replace a string, regex, list, dictionary

Example:

#NaN with zero on all columns
df2 = df.fillna(0)


#Using the inplace=True keyword in a pandas method changes the default behaviour.
    df.fillna(0, inplace = True)

# multiple columns appraoch
df[["Student", "ID"]] = df[["Student", "ID"]].fillna(0)

finally the replace() method :

df["Student"] = df["Student"].replace(np.nan, 0)

score 7 · Answer 10 · answered Mar 29 '19 at 19:46

7

To replace na values in pandas

df['column_name'].fillna(value_to_be_replaced,inplace=True)

if inplace = False, instead of updating the df (dataframe) it will return the modified values.

answered Mar 29 '19 at 19:46

Wickkiey

4,446
2
39
46

score 6 · Answer 11 · edited Feb 26 '19 at 11:41

6

Considering the particular column Amount in the above table is of integer type. The following would be a solution :

df['Amount'] = df.Amount.fillna(0).astype(int)

Similarly, you can fill it with various data types like float, str and so on.

In particular, I would consider datatype to compare various values of the same column.

edited Feb 26 '19 at 11:41

tuomastik

4,559
5
36
48

answered Feb 26 '19 at 11:21

Bharath_Raja

622
8
16

score 5 · Answer 12 · answered Mar 24 '20 at 23:38

5

To replace nan in different columns with different ways:

   replacement= {'column_A': 0, 'column_B': -999, 'column_C': -99999}
   df.fillna(value=replacement)

answered Mar 24 '20 at 23:38

Abo Omar

125
1
10

score 5 · Answer 13 · answered Mar 31 '21 at 03:24

5

This works for me, but no one's mentioned it. could there be something wrong with it?

df.loc[df['column_name'].isnull(), 'column_name'] = 0

answered Mar 31 '21 at 03:24

Ankhnesmerira

1,386
15
29

only this worked for me :D , df.fillna(0) didn't work (changed nothing with no error) – Leo Ma Dec 13 '21 at 15:09

score 3 · Answer 14 · answered Jul 23 '19 at 13:36

There are two options available primarily; in case of imputation or filling of missing values NaN / np.nan with only numerical replacements (across column(s):

df['Amount'].fillna(value=None, method= ,axis=1,) is sufficient:

From the Documentation:

value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.

Which means 'strings' or 'constants' are no longer permissable to be imputed.

For more specialized imputations use SimpleImputer():

from sklearn.impute import SimpleImputer
si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value')
df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])

score 2 · Answer 15 · answered May 15 '19 at 18:20

If you were to convert it to a pandas dataframe, you can also accomplish this by using fillna.

import numpy as np
df=np.array([[1,2,3, np.nan]])

import pandas as pd
df=pd.DataFrame(df)
df.fillna(0)

This will return the following:

     0    1    2   3
0  1.0  2.0  3.0 NaN
>>> df.fillna(0)
     0    1    2    3
0  1.0  2.0  3.0  0.0

score 2 · Answer 16 · answered Nov 07 '20 at 19:03

If you want to fill NaN for a specific column you can use loc:

d1 = {"Col1" : ['A', 'B', 'C'],
     "fruits": ['Avocado', 'Banana', 'NaN']}
d1= pd.DataFrame(d1)

output:

Col1    fruits
0   A   Avocado
1   B   Banana
2   C   NaN


d1.loc[ d1.Col1=='C', 'fruits' ] =  'Carrot'


output:

Col1    fruits
0   A   Avocado
1   B   Banana
2   C   Carrot

score 1 · Answer 17 · answered Mar 29 '22 at 22:38

I think it's also worth mention and explain the parameters configuration of fillna() like Method, Axis, Limit, etc.

From the documentation we have:

Series.fillna(value=None, method=None, axis=None, 
                 inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.

Parameters

value [scalar, dict, Series, or DataFrame] Value to use to 
 fill holes (e.g. 0), alternately a dict/Series/DataFrame 
 of values specifying which value to use for each index 
 (for a Series) or column (for a DataFrame). Values not in 
 the dict/Series/DataFrame will not be filled. This 
 value cannot be a list.

method [{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, 
 default None] Method to use for filling holes in 
 reindexed Series pad / ffill: propagate last valid 
 observation forward to next valid backfill / bfill: 
 use next valid observation to fill gap axis 
 [{0 or ‘index’}] Axis along which to fill missing values.

inplace [bool, default False] If True, fill 
 in-place. Note: this will modify any other views
 on this object (e.g., a no-copy slice for a 
 column in a DataFrame).

limit [int,defaultNone] If method is specified, 
 this is the maximum number of consecutive NaN 
 values to forward/backward fill. In other words, 
 if there is a gap with more than this number of 
 consecutive NaNs, it will only be partially filled. 
 If method is not specified, this is the maximum 
 number of entries along the entire axis where NaNs
 will be filled. Must be greater than 0 if not None.

downcast [dict, default is None] A dict of item->dtype 
 of what to downcast if possible, or the string ‘infer’ 
 which will try to downcast to an appropriate equal 
 type (e.g. float64 to int64 if possible).

Ok. Let's start with the method= Parameter this have forward fill (ffill) and backward fill(bfill) ffill is doing copying forward the previous non missing value.

e.g. :

import pandas as pd
import numpy as np
inp = [{'c1':10, 'c2':np.nan, 'c3':200}, {'c1':np.nan,'c2':110, 'c3':210}, {'c1':12,'c2':np.nan, 'c3':220},{'c1':12,'c2':130, 'c3':np.nan},{'c1':12,'c2':np.nan, 'c3':240}]
df = pd.DataFrame(inp)

  c1       c2      c3
0   10.0     NaN      200.0
1   NaN   110.0 210.0
2   12.0     NaN      220.0
3   12.0     130.0 NaN
4   12.0     NaN      240.0

Forward fill:

df.fillna(method="ffill")

    c1     c2      c3
0   10.0      NaN 200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

Backward fill:

df.fillna(method="bfill")

    c1      c2     c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

The Axis Parameter help us to choose the direction of the fill:

Fill directions:

ffill:

Axis = 1 
Method = 'ffill'
----------->
  direction 

df.fillna(method="ffill", axis=1)

       c1   c2      c3
0   10.0     10.0   200.0
1    NaN    110.0   210.0
2   12.0     12.0   220.0
3   12.0    130.0   130.0
4   12.0    12.0    240.0

Axis = 0 # by default 
Method = 'ffill'
|
|       # direction 
|
V
e.g: # This is the ffill default
df.fillna(method="ffill", axis=0)

    c1     c2      c3
0   10.0      NaN   200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

bfill:

axis= 0
method = 'bfill'
^
|
|
|
df.fillna(method="bfill", axis=0)

    c1     c2      c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

axis = 1
method = 'bfill'
<-----------
df.fillna(method="bfill", axis=1)
        c1     c2       c3
0    10.0   200.0   200.0
1   110.0   110.0   210.0
2    12.0   220.0   220.0
3    12.0   130.0     NaN
4    12.0   240.0   240.0

# alias:
#  'fill' == 'pad' 
#   bfill == backfill

limit parameter:

df
    c1     c2      c3
0   10.0      NaN   200.0
1    NaN    110.0   210.0
2   12.0      NaN   220.0
3   12.0    130.0     NaN
4   12.0      NaN   240.0

Only replace the first NaN element across columns:

df.fillna(value = 'Unavailable', limit=1)
            c1           c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0         NaN       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

df.fillna(value = 'Unavailable', limit=2)

           c1            c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0 Unavailable       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

downcast parameter:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      4 non-null      float64
 1   c2      2 non-null      float64
 2   c3      4 non-null      float64
dtypes: float64(3)
memory usage: 248.0 bytes

df.fillna(method="ffill",downcast='infer').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      5 non-null      int64  
 1   c2      4 non-null      float64
 2   c3      5 non-null      int64  
dtypes: float64(1), int64(2)
memory usage: 248.0 bytes

How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

17 Answers17

Replace all nan with 0

Linked

Related