pandas read_csv and setting na_values to any string in the csv file

Question

data.csv

1, 22, 3432

1, 23, \N

2, 24, 54335

2, 25, 3928

I have a csv file of data that is collected from a device. Every now and then the device doesn't relay information and outputs '\N'. I want to treat these as NaN and did this by doing

read_csv(data.csv, na_values=['\\N'])

which worked fine. However, I would prefer to have not only this string turned to NaN but any string that is in the csv file just in case the data I get in the future has a different string.

Is it possible to me to make any changes in the argument so it covers all strings?

I mean if I should get data that looks, for example, like 'Kud'. I would like my script to be as flexible as possible so it keep working in the case that '\\N' was changed. Note: '\\N' doed not appear in the csv file. It looks like \N just in case it looks confusing. — zipline86, Sep 07 '18 at 21:43

Abhi · Accepted Answer · 2018-09-07T22:15:03.647

You have to manually pass all the keywords as a list or dict to na_values

na_values : list-like or dict, default None

Alternatively, use pd.to_numeric and set errors to coerce to convert all values to numeric after reading the csv file.

sample input df:

    A   B        
0   1   2         
1   0  \N      
2  \N   8       
3  11   5       
4  11  Kud   

df = df.apply(pd.to_numeric, errors='coerce')

output:

     A     B        
0    1     2         
1    0   NaN      
2  NaN     8       
3   11     5       
4   11   NaN

pandas read_csv and setting na_values to any string in the csv file

1 Answers1

Related