0

I have a dataframe with date column having string format as follows: 20180406T165358. Now I'm trying to parse it with to_datetime(). So my format argument at to_datetime() should be format='%Y%m%dT%H%M%S' but format='%Y-%m-%dT%H:%M:%S' also works. So my question is: what is the role of those specific symbols '-' and ':' in parsing date?

Alex
  • 11
  • 1
  • `format='%Y-%m-%dT%H:%M:%S'` is actually wrong for your input. pandas is programmed clever enough to just ignore it. no special meaning of '-' and ':'. – FObersteiner Feb 05 '21 at 09:36

1 Answers1

0

It's an interesting find. As per docs pandas uses strptime() and strftime() on the back-end, but the behaviour of those is strangely different from those in datetime module.

While Pandas happily accepts both, actual datetime.datetime.strptime() fails on the second version of the format.

z = '20180406T165358'

dt.datetime.strptime(z, '%Y%m%dT%H%M%S')
Out[39]: datetime.datetime(2018, 4, 6, 16, 53, 58)

dt.datetime.strptime(z, '%Y-%m-%dT%H:%M:%S')
Traceback (most recent call last):

  File "<ipython-input-40-c37caf368af3>", line 1, in <module>
    dt.datetime.strptime(z, '%Y-%m-%dT%H:%M:%S')

  File "C:\ProgramData\Anaconda3\lib\_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)

  File "C:\ProgramData\Anaconda3\lib\_strptime.py", line 362, in _strptime
    (data_string, format))

ValueError: time data '20180406T165358' does not match format '%Y-%m-%dT%H:%M:%S'


pd.to_datetime([z], format='%Y%m%dT%H%M%S')
Out[41]: DatetimeIndex(['2018-04-06 16:53:58'], dtype='datetime64[ns]', freq=None)

pd.to_datetime([z], format='%Y-%m-%dT%H:%M:%S')
Out[42]: DatetimeIndex(['2018-04-06 16:53:58'], dtype='datetime64[ns]', freq=None)

It appears that in pandas special symbols just get ignored.

NotAName
  • 3,821
  • 2
  • 29
  • 44
  • pandas just ignores the wrong format directive (`'%Y-%m-%dT%H:%M:%S'` != `'%Y%m%dT%H%M%S'`) - which `strptime` doesn't. seems like "convenient" vs. "deterministic"... btw. for format '%Y-%m-%dT%H:%M:%S', use [fromisoformat](https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat), it is [more efficient](https://stackoverflow.com/a/61710371/10197418) ;-) – FObersteiner Feb 05 '21 at 09:34