Prevent pandas from reading "NA" as NaN

Question

The .csv-file I'm reading from contains cells with the value "NA". Pandas automatically converts these into NaN, which I don't want. I'm aware of the keep_default_na=False parameter, but that changes the dtype of the columns to object which means pd.get_dummies doesn't work correctly.

Is there any way to prevent pandas from reading "NA" as NaN without changing the dtype?

From the way your question is phrased, it sound like you want to pandas to read and store the string "NA" in a non-object column (e.g. a column of floats or integers). Is this the case? If so, no, this is not possible. — Alex Riley, Jan 01 '17 at 17:17
@ajcr Oops, you're right. I didn't realize how absurd this was until now. Back to the drawing board I guess. — krntz, Jan 01 '17 at 17:38
Does this answer your question? [Prevent pandas from interpreting 'NA' as NaN in a string](https://stackoverflow.com/questions/33952142/prevent-pandas-from-interpreting-na-as-nan-in-a-string) — David Buck, Feb 19 '20 at 20:02

score 25 · Answer 1 · answered Jan 01 '17 at 17:09

25

keep_default_na=False works for me

from io import StringIO
import pandas as pd

txt = """col1,col2
a,b
NA,US"""

print(pd.read_csv(StringIO(txt), keep_default_na=False))

  col1 col2
0    a    b
1   NA   US

Without it

print(pd.read_csv(StringIO(txt)))

  col1 col2
0    a    b
1  NaN   US

answered Jan 01 '17 at 17:09

piRSquared

285,575
57
475
624

6

Also you should specify 'na_values' is you have some null values that you have to be interpreted as null. You can do it like that : na_values=['NULL','null', 'nan','NaN'] – Copacel Oct 05 '18 at 13:19

score 4 · Answer 2 · edited Feb 19 '20 at 19:59

This is what Pandas documentation gives:

na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.

Suman Shrestha · Answer 3 · 2021-03-19T20:57:20.880

This approach work for me:

import pandas as pd
df = pd.read_csv('Test.csv')
co1 col2  col3  col4
a   b    c  d   e
NaN NaN NaN NaN NaN
2   3   4   5   NaN

I copied the value and created a list which are by default interpreted as NaN then comment out NA which I wanted to be interpreted as not NaN. This approach still treat other values as NaN except for NA.

 na_values = ["", 
             "#N/A", 
             "#N/A N/A", 
             "#NA", 
             "-1.#IND", 
             "-1.#QNAN", 
             "-NaN", 
             "-nan", 
             "1.#IND", 
             "1.#QNAN", 
             "<NA>", 
             "N/A", 
#              "NA", 
             "NULL", 
             "NaN", 
             "n/a", 
             "nan", 
             "null"]
df1 = pd.read_csv('Test.csv',na_values=na_values,keep_default_na=False )

      co1  col2  col3  col4
a     b     c     d     e
NaN  NA   NaN    NA   NaN
2     3     4     5   NaN

score 0 · Answer 4 · answered Jan 01 '17 at 17:01

0

You can try converting the column first to a str:

for index, row in df.iterrows():
    na_column = str(row['your_row'])
    if na_column != 'nan':
        # do something on column

answered Jan 01 '17 at 17:01

carloliwanag

348
3
12

Prevent pandas from reading "NA" as NaN

4 Answers4