NA values in column is not NaN Pandas Python

Question

I got a CSV File. I got a column Product. One of the products in it, called 'NA'. I want to get all 'NA' products. But if you read the file in python, the NA products get to be NaN value.

CSV looks like this:

Contact	Product
1	NA
2	ZE
3	HE
3

In python i get this

Contact	Product
1	NaN
2	ZE
3	HE
4	NaN

How can i change this.

please add a minimal reproducible output, https://stackoverflow.com/help/minimal-reproducible-example — Naga kiran, Aug 05 '21 at 07:50

score 1 · Answer 1 · answered Aug 05 '21 at 07:54

The read_csv method in pandas has the parameters na_values and keep_default_na, detailed in the documentation, which determine which text values are converted to NaN. At its most basic level, you could do:

import pandas as pd

df = pd.read_csv(your_file, keep_default_na=False)

And your 'NA' strings would no longer be converted to NaN. However this might have unintended consequences for other areas of your data, which would require refinement through na_values.

score 1 · Answer 2 · answered Aug 05 '21 at 07:55

1

According to read_csv you can do:

df = pd.read_csv("filename CSV", keep_default_na=False)

answered Aug 05 '21 at 07:55

mozway

194,879
13
39
75

score 0 · Answer 3 · answered Aug 05 '21 at 07:55

From pandas.read_csv doc

na_values: scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.
By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

So, NA will automatically be read as NaN, but there is additional parameter called keep_default_na, you can pass False to this parameter if you want to change that default behavior:

df = pd.read_csv(path, keep_default_na=False)

   Contact Product
0        1      NA
1        2      ZE
2        3      HE
3        3

But you may need to specify the values which you want pandas to represent as NaN, passing a list of such values, or a dictionary where key is the column and value is the values to represent as NaN.

df = pd.read_csv(path, keep_default_na=False, na_values=[''])

   Contact Product
0        1      NA
1        2      ZE
2        3      HE
3        3     NaN

NA values in column is not NaN Pandas Python

3 Answers3