1

I have to read tables where some columns may be dates. These tables are defined by the user so the format may change from one to another. For example, these would be valid formats:

'1998'
'2003-10-1'
'2003/5'
'2004/4/5 10:40'

Pandas pd.to_datetime() handles this really well, but when it is time to display the date I want to display just the part of the DateTime defined by the user. Continuing with the previous example:

'1998' --> '1998'
'2003-10-1' --> '2003-10-1'
'2003/5' --> '2003-05'
'2004/4/5 10:40' --> '2004-04-05T10:40'

What I am trying to avoid is, when a user just defines the year '1998', to display the full ISO DateTime '1998-01-01T00:00'. I would like to know if pandas or python provide such functionality.

If I have to create my own function I guess I would force the user to follow the ISO 8601 and use a regex to extract the groups of the date.

EDIT:

Since it seems it is not possible to solve it using pandas I have decided to enforce the use of ISO8601 and use these functions:

import re
from pandas._libs.tslib import _test_parse_iso8601

def is_iso8601(string):
    try:
        _test_parse_iso8601(string)
        return True
    except ValueError:
        return False

def capture_iso_date(text):
    if not is_iso8601(text):
        raise ValueError("Date format not valid")
    
    date_parts = ["year", "month", "day", "hour", "minute", "second"]
    regex = r"(\d{4})(?:-(\d{2}))?(?:-(\d{2}))?(?:[\s,T](\d{2}))?(?::(\d{2}))?(?::(\d{2}))?"
    match = re.search(regex, text)
    # This is not necessary, but solves linter problems
    if match is None: raise ValueError("Date format not valid")
    
    date_list = match.groups()
    date_dict = {date_parts[i]: date_list[i] for i in range(len(date_list))}
    return date_dict

def format_iso_date(date_dict):
    ret: str = ""
    if date_dict['year']: ret += date_dict['year']
    else: return None
    if date_dict['month']: ret += "-" + date_dict['month']
    else: return ret
    if date_dict['day']: ret += "-" + date_dict['day']
    else: return ret
    if date_dict['hour']: ret += " " + date_dict['hour']
    else: return ret
    if date_dict['minute']: ret += ":" + date_dict['minute']
    else: return ret+':00'
    if date_dict['second']: ret += ":" + date_dict['second']
    else: return ret
    return ret

format_iso_date(capture_iso_date("2021-03-28T12"))

Thanks to FObersteiner for pointing me this question: Datetime conversion - How to extract the inferred format?

edoelas
  • 317
  • 2
  • 13
  • No, pandas doesn't do this. Once converted to datetime the original information is lost. You might want to try to infer the format, but here it seems that you have to manually define mappings – mozway Mar 27 '23 at 13:13
  • There's no part defined by the user. `to_datetime` returns a full `datetime` value, not a year or month. You get `1998-01-01 00:00`, not the year `1998`. – Panagiotis Kanavos Mar 27 '23 at 13:13
  • What are you trying to do? 1998 isn't a date, it's a period. Pandas has a [Period](https://pandas.pydata.org/docs/reference/api/pandas.Period.html) class to represent this – Panagiotis Kanavos Mar 27 '23 at 13:14
  • 1
    related: [Datetime conversion - How to extract the inferred format?](https://stackoverflow.com/q/46842793/10197418). You could theoretically build something based on that but I'd consider it to be a (large) hack. Just use ISO format for output. Let the user *know* that you're working with datetime, not a period etc. – FObersteiner Mar 27 '23 at 14:50

0 Answers0