3

I have a list of NaN values in my dataframe and I want to replace NaN values with an empty string.

What I've tried so far, which isn't working:

df_conbid_N_1 = pd.read_csv("test-2019.csv",dtype=str, sep=';', encoding='utf-8')
df_conbid_N_1['Excep_Test'] = df_conbid_N_1['Excep_Test'].replace("NaN","")
yatu
  • 86,083
  • 12
  • 84
  • 139

3 Answers3

11

Use fillna (docs): An example -

df = pd.DataFrame({'no': [1, 2, 3],
                    'Col1':['State','City','Town'],
                  'Col2':['abc', np.NaN, 'defg'],
                  'Col3':['Madhya Pradesh', 'VBI', 'KJI']})

df

   no   Col1    Col2    Col3
0   1   State   abc Madhya Pradesh
1   2   City    NaN VBI
2   3   Town    defg    KJI

df.Col2.fillna('', inplace=True)
df

    no  Col1    Col2    Col3
0   1   State   abc     Madhya Pradesh
1   2   City            VBI
2   3   Town    defg    KJI
dumbledad
  • 16,305
  • 23
  • 120
  • 273
meW
  • 3,832
  • 7
  • 27
2

Simple! you can do this way

df_conbid_N_1 = pd.read_csv("test-2019.csv",dtype=str, sep=';',encoding='utf-8').fillna("")
Hemil Patel
  • 311
  • 1
  • 8
0

We have pandas' fillna to fill missing values.


Let's go through some uses cases with a sample dataframe:

df = pd.DataFrame({'col1':['John', np.nan, 'Anne'], 'col2':[np.nan, 3, 4]})

   col1  col2
0  John   NaN
1   NaN   3.0
2  Anne   4.0

As mentioned in the docs, fillna accepts the following as fill values:

values: scalar, dict, Series, or DataFrame

So we can replace with a constant value, such as an empty string with:

df.fillna('')

   col1 col2
0  John     
1          3
2  Anne    4
1

You can also replace with a dictionary mapping column_name:replace_value:

df.fillna({'col1':'Alex', 'col2':2})

   col1  col2
0  John   2.0
1  Alex   3.0
2  Anne   4.0

Or you can also replace with another pd.Series or pd.DataFrame:

df_other = pd.DataFrame({'col1':['John', 'Franc', 'Anne'], 'col2':[5, 3, 4]})

df.fillna(df_other)

    col1  col2
0   John   5.0
1  Franc   3.0
2   Anne   4.0

This is very useful since it allows you to fill missing values on the dataframes' columns using some extracted statistic from the columns, such as the mean or mode. Say we have:

df = pd.DataFrame(np.random.choice(np.r_[np.nan, np.arange(3)], (3,5)))
print(df)

     0    1    2    3    4
0  NaN  NaN  0.0  1.0  2.0
1  NaN  2.0  NaN  2.0  1.0
2  1.0  1.0  2.0  NaN  NaN

Then we can easilty do:

df.fillna(df.mean())

     0    1    2    3    4
0  1.0  1.5  0.0  1.0  2.0
1  1.0  2.0  1.0  2.0  1.0
2  1.0  1.0  2.0  1.5  1.5
yatu
  • 86,083
  • 12
  • 84
  • 139