Is there a way to check if date values in a DataFrame column abide by 1 of N possible date formats, and no others?
The only acceptable formats are:
- YYYY-MM-DD H:M:S --> (with or without decimal places in the seconds field)
- YYYY-MM-DD H:M
- YYYY-MM-DD H
- YYYY-MM-DD
In each of the 4 cases, I want to append the required number of 0's to make the field whole. So, for example, "2020-11-28 12" becomes "2020-11-28 12:00:00", "2020-12-18 22:31" becomes "2020-12-18 22:31:00" etc.
I borrowed @cs95's answer from this post and it works like a charm for all 4 formats I mentioned above. This is what I wrote:
if pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce').notnull().all():
df['DATE'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce')
#something
However, the problem is that it also allows for fields with just YYYY and YYYY-MM. For instance, it'll accept "2020" as a valid field and make it "2020-01-01 00:00:00".
I want to make sure the allowance for missing fields is strictly capped at YYYY-MM-DD (i.e. YYYY and YYYY-MM are rejected). Can this be done using the method I've used or do I have to try something else? Any help is appreciated here!