21

I am preparing a pandas df for output, and would like to remove the NaN and NaT in the table, and leave those table locations blank. An example would be

mydataframesample 

col1    col2     timestamp
a       b        2014-08-14
c       NaN      NaT

would become

col1    col2     timestamp
a       b        2014-08-14
c       

Most of the values are dtypes object, with the timestamp column being datetime64[ns]. In order to fix this, I attempted to use panda's mydataframesample.fillna(' ') to effectively leave a space in the location. However, this doesn't work with the datetime types. In order to get around this, I'm trying to convert the timestamp column back to object or string type.

Is it possible to remove the NaN/NaT without doing the type conversion? If not, how do I do the type conversion (tried str() and astype(str) but difficulty with datetime being the original format)?

ding
  • 585
  • 3
  • 7
  • 13
  • 1
    I don't think you can replace the datetime `NaT` as you've found, what is the problem with having `NaN`/`NaT`'s? – EdChum Aug 05 '14 at 14:55
  • 2
    What do you mean by 'output'? In some cases (e.g. saving to CSV) the `NaN`/`NaT` will automatically be filled with blanks. – chrisb Aug 05 '14 at 15:18
  • I'm converting to html, and sending it as an e-mail. Will the NaN/NaT still automatically be filled with blanks @chrisb? – ding Aug 05 '14 at 15:19

4 Answers4

20

I had the same issue: This does it all in place using pandas apply function. Should be the fastest method.

import pandas as pd
df['timestamp'] = df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')

if your timestamp field is not yet in datetime format then:

import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
14

This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:

df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
    'col1': list('ac'),
    'col2': ['b', nan],
    'date': (Timestamp('2014-08-14'), NaT)
    })

df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]

print(df1)

yields

  col1 col2        date
0    a    b  2014-08-14
1    c                 
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
3

@unutbu's answer will work fine, but if you don't want to modify the DataFrame, you could do something like this. to_html takes a parameter for how NaN is represented, to handle the NaT you need to pass a custom formatting function.

date_format = lambda d : pd.to_datetime(d).strftime('%Y-%m-%d') if not pd.isnull(d) else ''

df1.to_html(na_rep='', formatters={'date': date_format})
chrisb
  • 49,833
  • 8
  • 70
  • 70
0

If all you want to do is convert to a string:

In [37]: df1.to_csv(None,sep=' ')
Out[37]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c  \n'

To replace missing values with a string

In [36]: df1.to_csv(None,sep=' ',na_rep='missing_value')
Out[36]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c missing_value missing_value\n'
Jeff
  • 125,376
  • 21
  • 220
  • 187