1

In Pandas, I have column with dates of format: %Y/%m/%d (e.g. 2015/10/31). I want to change this format to another like: %d-%m-%y (e.g. 31-10-15).

Turning the column into the correct object for later sorting:

df['Date'] = pd.to_datetime(df['Date'])

Applying strptime:

df['Date'] = df['Date'].apply(lambda x:
    datetime.strptime(x,'%d-%m-%y'))

TypeError: must be str, not Timestamp

Also, oddly, if the dates enter pandas dataframe in anything other than the default ISO standard then weird results occur on sorting. Such as inconsistent formats and/or not properly sorting:

0  2015-01-31
1  2016-15-01
Prof
  • 657
  • 1
  • 14
  • 24

2 Answers2

2

Because you are dealing with a datetime object already, you are getting this error because strptime requires a string and not a Timestamp object. From the definition of strptime:

def strptime(cls, date_string, format):
        'string, format -> new datetime parsed from a string (like time.strptime()).'

What you are actually looking to do is first convert your datetime to the format you require in to a string using strftime:

def strftime(self, format):
        """Return a string representing the date and time, controlled by an
        explicit format string.

and then bring it back to a datetime object using strptime. The following demo will demonstrate. Note the use of .date() at the end in order to remove the unneeded 00:00:00 time portion.

>>> from datetime import datetime
>>> orig_datetime_obj = datetime.strptime("2015/10/31", '%Y/%m/%d').date()
>>> print(orig_datetime_obj)
2015-10-31
>>> print(type(orig_datetime_obj))
<type 'datetime.datetime'>
>>> new_datetime_obj = datetime.strptime(orig_datetime_obj.strftime('%d-%m-%y'), '%d-%m-%y').date()
>>> print(new_datetime_obj)
2015-10-31
>>> print(type(new_datetime_obj))
<type 'datetime.date'>

Alternatively, if all you require is just converting it to a different format but in a string, you can simply stick to just using strftime with your new format. Using my example above, you would only need this portion:

orig_datetime_obj.strftime('%d-%m-%y')
idjaw
  • 25,487
  • 7
  • 64
  • 83
  • @Prof I noticed you removed the accepted answer. Is there a particular problem? Or was there an easier way to solve this in pandas? I'm asking out of curiosity :) – idjaw Oct 31 '15 at 15:12
  • I unaccepted the answer just now because I realised that I wasn't clear with my question. My code consists of a string with `.strptime` applied to it and then `.strftime` used to change it to the correct format. However, Pandas does not sort the columns with respect to date properly so I have to use a format that is sorted properly above. This format is `'%Y/%m/%d'`. Using this format instead, Pandas sorts the columns properly. But I'd like it to be sorted and then have the values return to the format originally desired: `'%d-%m-%y'`. – Prof Oct 31 '15 at 15:22
  • So I feel the format change must happen either inside the `.to_datetime` or the `df[].apply`. You realised that I unaccepted before I could finish, my apologies! – Prof Oct 31 '15 at 15:23
  • Have you got any ideas? – Prof Nov 01 '15 at 11:21
  • Apologies. Didn't mean to leave you hanging. For the order of data when changing the format, I am not too sure tbh. I actually don't use pandas all that much. Sorry. =/ – idjaw Nov 01 '15 at 12:15
0

The solution to one part of the question:

I want to change this format to another like: %d-%m-%y (e.g. 31-10-15).

... If the dates enter pandas dataframe in anything other than the default ISO standard then weird results occur on sorting. Such as inconsistent formats and/or not properly sorting

is actually really simple. Pandas was having issues because it considers the dd-mm-yy format international and an additional dayfirst argument must be entered for it to be treated properly:

df['Date'] = pd.to_datetime(df['Date'], dayfirst = [True])

May also need this (currently works for parsing data but not displaying):

pd.set_option('display.date_dayfirst', True)

Community
  • 1
  • 1
Prof
  • 657
  • 1
  • 14
  • 24