How to set a cell to NaN in a pandas dataframe

Question

I'd like to replace bad values in a column of a dataframe by NaN's.

mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
df = pd.DataFrame(mydata)

df[df.y == 'N/A']['y'] = np.nan

Though, the last line fails and throws a warning because it's working on a copy of df. So, what's the correct way to handle this? I've seen many solutions with iloc or ix but here I need to use a boolean condition.

I feel like the title is misleading. The problem isn't that you want NaN in your dataframe. The problem is that you're "trying to be set on a copy of a slice from a DataFrame". — Teepeemm, Sep 11 '20 at 20:46

score 189 · Accepted Answer · answered Jan 14 '16 at 16:02

189

just use replace:

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

What you're trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

You can use loc to ensure you operate on the original dF:

In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df

Out[108]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

answered Jan 14 '16 at 16:02

EdChum

376,765
198
813
562

2

With this solution you have to import also numpy as np. By using pd.NA there is no need to import numpy. – Armando Contestabile Oct 07 '22 at 14:29

stallingOne · Answer 2 · 2021-11-18T04:57:40.173

40

Most replies here above need to import an external module: import numpy as np

There is a built-in solution into pandas itself: pd.NA, to use like this:

df.replace('N/A', pd.NA)

edited Nov 18 '21 at 04:57

answered Nov 12 '20 at 15:16

stallingOne

3,633
3
41
63

https://stackoverflow.com/questions/60115806/pd-na-vs-np-nan-for-pandas – Jérôme Jul 25 '22 at 08:54

score 17 · Answer 3 · answered Jan 14 '16 at 16:21

While using replace seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

Name: y, dtype: object

What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

Thus, good conversion code would be

pd.to_numeric(df['y'], errors='coerce')

Specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN. Column type would be

Name: y, dtype: float64

jmorrison · Answer 4 · 2016-01-14T16:10:28.510

You can use replace:

df['y'] = df['y'].replace({'N/A': np.nan})

Also be aware of the inplace parameter for replace. You can do something like:

df.replace({'N/A': np.nan}, inplace=True)

This will replace all instances in the df without creating a copy.

Similarly, if you run into other types of unknown values such as empty string or None value:

df['y'] = df['y'].replace({'': np.nan})

df['y'] = df['y'].replace({None: np.nan})

Reference: Pandas Latest - Replace

score 12 · Answer 5 · answered Jul 29 '20 at 15:03

12

As of pandas 1.0.0, you no longer need to use numpy to create null values in your dataframe. Instead you can just use pandas.NA (which is of type pandas._libs.missing.NAType), so it will be treated as null within the dataframe but will not be null outside dataframe context.

answered Jul 29 '20 at 15:03

slevin886

261
2
10

While this doesn't solve OP's problem, I upvoted because it actually answered the question in the title. – Teepeemm Sep 11 '20 at 20:47

score 1 · Answer 6 · answered Aug 06 '18 at 11:38

1

df.loc[df.y == 'N/A',['y']] = np.nan

This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.

answered Aug 06 '18 at 11:38

jeremie benichou

11
1

score 1 · Answer 7 · edited Sep 22 '22 at 08:59

1

To replace value directly in the DataFrame, use the inplace argument.

df.replace('columnvalue', np.NaN, inplace=True)

edited Sep 22 '22 at 08:59

David Beauchemin

231
1
2
12

answered Aug 28 '20 at 00:34

sameer_nubia

721
8
8

score 1 · Answer 8 · answered Oct 18 '22 at 17:40

1

You can use where or mask:

df = df.where(df != 'N/A')

or

df = df.mask(df == 'N/A')

answered Oct 18 '22 at 17:40

Mykola Zotko

15,583
3
71
73

score 0 · Answer 9 · answered Sep 24 '17 at 03:40

You can try these snippets.

In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
In [17]:df=pd.DataFrame(mydata)

In [18]:df.y[df.y=="N/A"]=np.nan

Out[19]:df 
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

score -1 · Answer 10 · answered Dec 30 '22 at 05:26

you can use this method fillna which pandas gives

df.fillna(0,inplace=True)

first parameter is whatever value you want to replace the NA with.

By default, the Pandas fillna method returns a new dataframe. (This is the default behavior because by default, the inplace parameter is set to inplace = False.)

If you set inplace = True, the method will return nothing, and will instead directly modify the dataframe that’s being operated on.

How to set a cell to NaN in a pandas dataframe

10 Answers10

Linked

Related