2

I am writing the code for a small application in python and i realize that i get errors in my import data functions when the files (txt, dat, csv ...) contain missing values written like NAN or "NAN" in some of the data, while there is no problem by importing the data if these values are written as nan or NaN.

For example

06.02.2011 00:10:00 NAN 43 30 2 37 42 30 2 34 41 19 4 302 5 306 8 69 2810 2811 2810 974 46 130
06.02.2011 00:20:00 36 41 28 2 36 42 27 2 35 42 26 3 295 8 298 8 69 2811 2811 2811 974 46 130

The value NAN in the first row will raise errors as it is considered a string inside the data

While a file with nan values is considered just a missing value and therefore no problematic

06.02.2011 00:10:00 nan 43 30 2 37 42 30 2 34 41 19 4 302 5 306 8 69 2810 2811 2810 974 46 130
06.02.2011 00:20:00 36 41 28 2 36 42 27 2 35 42 26 3 295 8 298 8 69 2811 2811 2811 974 46 130

I do not know which import function or library in python to modify in order to include all the possibilities of reading the word Nan and avoid errors.

gis20
  • 1,024
  • 2
  • 15
  • 33

2 Answers2

7

You could add your variables which you'd like to interpret as NaN to na_values argument of the pd.read_csv:

df = pd.read_csv('your_file.csv', na_values=['NAN'])

Also you could find some information in that answer.

All default NA values from na-values:

The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan']. Although a 0-length string '' is not included in the default NaN values list, it is still treated as a missing value.

Community
  • 1
  • 1
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
0

What kind of error? It seems to be working fine. I tried Python2 and 3.

>>> float("NAN")
nan
>>> float("NaN")
nan
>>> float("nan")
nan

Perhaps you are trying to convert to int? The int type doesn't allow a "NaN" value.

LtWorf
  • 7,286
  • 6
  • 31
  • 45