9

I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the cell contains a space:

df.replace(r'\s+',np.nan,regex=True)

This code does not work when the cell is empty. Has anyone a suggestion for a panda code to replace empty cells.

Ryan M
  • 18,333
  • 31
  • 67
  • 74
Wannes Dermauw
  • 161
  • 1
  • 2
  • 8

4 Answers4

10

I think the easiest thing here is to do the replace twice:

In [117]:
df = pd.DataFrame({'a':['',' ','asasd']})
df

Out[117]:
       a
0       
1       
2  asasd

In [118]:
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)

Out[118]:
       a
0    NaN
1    NaN
2  asasd
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    Please note that this solution does not always work. In my case, it did not work for string `2015-09-02 06:13:40`, i.e, it resulted in converting this value to NaN (while that was not supposed to be happening). Check for more characters in the string: `.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))` – Guido Feb 24 '16 at 14:14
  • In my case, `df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)` replaced all rows (including non empty rows) to NaN!!! – mOna Jun 08 '22 at 22:15
4

Both other answers do not take in account all characters in a string. This is better:

df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))

More docs on: Replacing blank values (white space) with NaN in pandas

Community
  • 1
  • 1
Guido
  • 6,182
  • 1
  • 29
  • 50
  • What does `( +\.)|#` do? Works great, by the way – Plasma Jan 23 '17 at 09:28
  • just curious, why are you using `|#` ? this replaced all the hex colors in my df: `#000000 ` with NaN. it's not like there are comments in a df... it's more likely that there are colors for plots.. – Claudiu Creanga Feb 12 '18 at 11:04
3

How about this?

df.replace(r'\s+|^$', np.nan, regex=True)
UNagaswamy
  • 2,068
  • 3
  • 30
  • 35
2

As you've already seen, if you do the obvious thing and replace() with None it throws an error:

df.replace('', None)
TypeError: cannot replace [''] with method pad on a DataFrame

The solution seems to be to simply replace the empty string with numpy's NaN.

import numpy as np
df.replace('', np.NaN)

While I'm not 100% sure that pd.NaN is treated in exactly the same way as np.NaN across all edge cases, I've not had any problems. fillna() works, persisting NULLs to database in place of np.NaN works, persisting NaN to csv works.

(Pandas version 18.1)

deepgeek
  • 155
  • 1
  • 1
  • 7