I have a date input 'date_dob' which is '20-Apr-53' I tried converting this to format yyyy-mm-dd using following code:
print pd.to_datetime(date_dob,format = '%d-%b-%y')
which returns '2053-04-20 00:00:00' instead of '1953-04-20 00:00:00'
You need to explicitly check for the year, and then subtract it by 100 years as:
>>> import datetime
>>> my_date = '20-Apr-53'
>>> d = datetime.datetime.strptime(my_date, '%d-%b-%y')
>>> d
datetime.datetime(2053, 4, 20, 0, 0)
# ^ year = 2053
>>> if d.year > 2000:
... d = d.replace(year=d.year-100) # subtract year by 100
...
>>> d
datetime.datetime(1953, 4, 20, 0, 0)
# ^ year = 1953
>>> d.strftime('%Y-%m-%d') # %Y instead of %y, reason mentioned below
'1953-04-20'
# ^ all four digit of year
Using %Y
instead of %y
because:
%y
: displays last two digit of year
%Y
: display all four digit of year
The official doc is your friend. Consult it ;-)
%y -> Year without century as a zero-padded decimal number.
%Y -> Year with century as a decimal number.
So use this approach
format='%d-%b-%Y'