pandas. read_csv reading NULL and empty spaces as nan

Question

I have downloaded a database table into a csv file.

Now one of the columns has both empty cells and also some of them are written as NULL. Below is the sample dataframe created by reading that CSV file in pandas:

df:

Col1    Col2      Amount1    Amount2
A       NULL      100.22     100.22
A       NULL      100.22     100.22
A       Virgo     100.22     100.22
A       Arkenea   100.22     100.22
A                 100.22     100.22
A                 100.22     100.22

Now when i execute the below code i get the following output:

df[(df['Col1']=="B")].unique()

Output:

('nan', 'Virgo', 'Arkenea')

Here i am not able to differentiate between NULL and empty cells as both are shown as 'nan'. Also if i do fillna, both the NULL and empty columns get updated with the new value. Is there any way i can read the file so that NULL and empty cells are shown separately.

Any leads on this are appreciated.

Please ignore typos, if any. this is not the actual data frame but a mock up data. — RSM, Nov 29 '20 at 05:49
`pd.read_csv(r'\path\to\file.csv',keep_default_na=False)` ? THen you can use `na_values` whatever you want to be treated as `NaN` — anky, Nov 29 '20 at 06:27

Akash Ranjan · Accepted Answer · 2020-12-07T03:33:26.757

Based on my understanding here is something which i was able to do,

>>> import pandas as pd
>>> import numpy as np
>>> d = {'col1': [1, 2, 2, 3], 'col2': [3, 4, 5, 6], 'col3': ['NULL','NULL', np.nan, 'virgo']}
>>> df = pd.DataFrame(data=d)
>>> df.isnull()
    col1   col2   col3
0  False  False  False
1  False  False  False
2  False  False   True
3  False  False  False
>>> df['col3'].unique()
array(['NULL', nan, 'virgo'], dtype=object)

When i save the above file and try to read it again then it takes all the NULL and blank entries as NaN values. To fix it,

df1 = pd.read_csv('/tmp/test.csv', keep_default_na=False, na_values=[''])

This worked for me.

>>> df1.isnull()
   Unnamed: 0   col1   col2   col3
0       False  False  False  False
1       False  False  False  False
2       False  False  False   True
3       False  False  False  False

pandas. read_csv reading NULL and empty spaces as nan

1 Answers1