0

Is there a way to check if date values in a DataFrame column abide by 1 of N possible date formats, and no others?

The only acceptable formats are:

  1. YYYY-MM-DD H:M:S --> (with or without decimal places in the seconds field)
  2. YYYY-MM-DD H:M
  3. YYYY-MM-DD H
  4. YYYY-MM-DD

In each of the 4 cases, I want to append the required number of 0's to make the field whole. So, for example, "2020-11-28 12" becomes "2020-11-28 12:00:00", "2020-12-18 22:31" becomes "2020-12-18 22:31:00" etc.

I borrowed @cs95's answer from this post and it works like a charm for all 4 formats I mentioned above. This is what I wrote:

if pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce').notnull().all():
    df['DATE'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce')
    #something

However, the problem is that it also allows for fields with just YYYY and YYYY-MM. For instance, it'll accept "2020" as a valid field and make it "2020-01-01 00:00:00".

I want to make sure the allowance for missing fields is strictly capped at YYYY-MM-DD (i.e. YYYY and YYYY-MM are rejected). Can this be done using the method I've used or do I have to try something else? Any help is appreciated here!

divij_o
  • 125
  • 8
  • 1
    you could test if it can be parsed by Python [datetime.fromisoformat](https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat), that would cover all the formats you specified. – FObersteiner Dec 21 '20 at 17:53
  • @MrFuppes I tried using datetime.fromisoformat, but it wouldn't parse dates where the second value has decimals past milliseconds. For ex: datetime.fromisoformat('2011-11-11 13:11:11.2332') throws a ValueError (Invalid isoformat string) – divij_o Dec 24 '20 at 05:30
  • Ideally, I want to allow for any number of decimal places in the seconds field – divij_o Dec 24 '20 at 05:35
  • 1
    Then I suppose you'll have to test specific strptime directives - not sure if `%f` parses 4-digit fractional seconds but you could give it a try – FObersteiner Dec 24 '20 at 08:56

0 Answers0