I have to read tables where some columns may be dates. These tables are defined by the user so the format may change from one to another. For example, these would be valid formats:
'1998'
'2003-10-1'
'2003/5'
'2004/4/5 10:40'
Pandas pd.to_datetime() handles this really well, but when it is time to display the date I want to display just the part of the DateTime defined by the user. Continuing with the previous example:
'1998' --> '1998'
'2003-10-1' --> '2003-10-1'
'2003/5' --> '2003-05'
'2004/4/5 10:40' --> '2004-04-05T10:40'
What I am trying to avoid is, when a user just defines the year '1998'
, to display the full ISO DateTime '1998-01-01T00:00'
. I would like to know if pandas or python provide such functionality.
If I have to create my own function I guess I would force the user to follow the ISO 8601 and use a regex to extract the groups of the date.
EDIT:
Since it seems it is not possible to solve it using pandas I have decided to enforce the use of ISO8601 and use these functions:
import re
from pandas._libs.tslib import _test_parse_iso8601
def is_iso8601(string):
try:
_test_parse_iso8601(string)
return True
except ValueError:
return False
def capture_iso_date(text):
if not is_iso8601(text):
raise ValueError("Date format not valid")
date_parts = ["year", "month", "day", "hour", "minute", "second"]
regex = r"(\d{4})(?:-(\d{2}))?(?:-(\d{2}))?(?:[\s,T](\d{2}))?(?::(\d{2}))?(?::(\d{2}))?"
match = re.search(regex, text)
# This is not necessary, but solves linter problems
if match is None: raise ValueError("Date format not valid")
date_list = match.groups()
date_dict = {date_parts[i]: date_list[i] for i in range(len(date_list))}
return date_dict
def format_iso_date(date_dict):
ret: str = ""
if date_dict['year']: ret += date_dict['year']
else: return None
if date_dict['month']: ret += "-" + date_dict['month']
else: return ret
if date_dict['day']: ret += "-" + date_dict['day']
else: return ret
if date_dict['hour']: ret += " " + date_dict['hour']
else: return ret
if date_dict['minute']: ret += ":" + date_dict['minute']
else: return ret+':00'
if date_dict['second']: ret += ":" + date_dict['second']
else: return ret
return ret
format_iso_date(capture_iso_date("2021-03-28T12"))
Thanks to FObersteiner for pointing me this question: Datetime conversion - How to extract the inferred format?