0

I have downloaded a database table into a csv file.

Now one of the columns has both empty cells and also some of them are written as NULL. Below is the sample dataframe created by reading that CSV file in pandas:

df:

Col1    Col2      Amount1    Amount2
A       NULL      100.22     100.22
A       NULL      100.22     100.22
A       Virgo     100.22     100.22
A       Arkenea   100.22     100.22
A                 100.22     100.22
A                 100.22     100.22

Now when i execute the below code i get the following output:

df[(df['Col1']=="B")].unique()

Output:

('nan', 'Virgo', 'Arkenea')

Here i am not able to differentiate between NULL and empty cells as both are shown as 'nan'. Also if i do fillna, both the NULL and empty columns get updated with the new value. Is there any way i can read the file so that NULL and empty cells are shown separately.

Any leads on this are appreciated.

RSM
  • 645
  • 10
  • 25

1 Answers1

2

Based on my understanding here is something which i was able to do,

>>> import pandas as pd
>>> import numpy as np
>>> d = {'col1': [1, 2, 2, 3], 'col2': [3, 4, 5, 6], 'col3': ['NULL','NULL', np.nan, 'virgo']}
>>> df = pd.DataFrame(data=d)
>>> df.isnull()
    col1   col2   col3
0  False  False  False
1  False  False  False
2  False  False   True
3  False  False  False
>>> df['col3'].unique()
array(['NULL', nan, 'virgo'], dtype=object)

When i save the above file and try to read it again then it takes all the NULL and blank entries as NaN values. To fix it,

df1 = pd.read_csv('/tmp/test.csv', keep_default_na=False, na_values=[''])

This worked for me.

>>> df1.isnull()
   Unnamed: 0   col1   col2   col3
0       False  False  False  False
1       False  False  False  False
2       False  False  False   True
3       False  False  False  False
Akash Ranjan
  • 922
  • 12
  • 24