23

The .csv-file I'm reading from contains cells with the value "NA". Pandas automatically converts these into NaN, which I don't want. I'm aware of the keep_default_na=False parameter, but that changes the dtype of the columns to object which means pd.get_dummies doesn't work correctly.

Is there any way to prevent pandas from reading "NA" as NaN without changing the dtype?

krntz
  • 613
  • 1
  • 6
  • 11
  • 1
    From the way your question is phrased, it sound like you want to pandas to read and store the string "NA" in a non-object column (e.g. a column of floats or integers). Is this the case? If so, no, this is not possible. – Alex Riley Jan 01 '17 at 17:17
  • @ajcr Oops, you're right. I didn't realize how absurd this was until now. Back to the drawing board I guess. – krntz Jan 01 '17 at 17:38
  • 1
    Does this answer your question? [Prevent pandas from interpreting 'NA' as NaN in a string](https://stackoverflow.com/questions/33952142/prevent-pandas-from-interpreting-na-as-nan-in-a-string) – David Buck Feb 19 '20 at 20:02

4 Answers4

25

keep_default_na=False works for me

from io import StringIO
import pandas as pd

txt = """col1,col2
a,b
NA,US"""

print(pd.read_csv(StringIO(txt), keep_default_na=False))

  col1 col2
0    a    b
1   NA   US

Without it

print(pd.read_csv(StringIO(txt)))

  col1 col2
0    a    b
1  NaN   US
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 6
    Also you should specify 'na_values' is you have some null values that you have to be interpreted as null. You can do it like that : na_values=['NULL','null', 'nan','NaN'] – Copacel Oct 05 '18 at 13:19
4

This is what Pandas documentation gives:

na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.
David Buck
  • 3,752
  • 35
  • 31
  • 35
Koo
  • 1,442
  • 13
  • 16
3

This approach work for me:

import pandas as pd
df = pd.read_csv('Test.csv')
co1 col2  col3  col4
a   b    c  d   e
NaN NaN NaN NaN NaN
2   3   4   5   NaN

I copied the value and created a list which are by default interpreted as NaN then comment out NA which I wanted to be interpreted as not NaN. This approach still treat other values as NaN except for NA.

 na_values = ["", 
             "#N/A", 
             "#N/A N/A", 
             "#NA", 
             "-1.#IND", 
             "-1.#QNAN", 
             "-NaN", 
             "-nan", 
             "1.#IND", 
             "1.#QNAN", 
             "<NA>", 
             "N/A", 
#              "NA", 
             "NULL", 
             "NaN", 
             "n/a", 
             "nan", 
             "null"]
df1 = pd.read_csv('Test.csv',na_values=na_values,keep_default_na=False )

      co1  col2  col3  col4
a     b     c     d     e
NaN  NA   NaN    NA   NaN
2     3     4     5   NaN
Suman Shrestha
  • 161
  • 1
  • 6
0

You can try converting the column first to a str:

for index, row in df.iterrows():
    na_column = str(row['your_row'])
    if na_column != 'nan':
        # do something on column
carloliwanag
  • 348
  • 3
  • 12