I'm looking at files of data. The data always has some form of timestamp that is consistent throughout a given file, but the format varies widely from file to file.
A given file often starts with something like 01/02/17. It is easy to guess that the year is last but is it DMY or MDY? Some later timestamp might be 27/02/17 which is unambiguous, how can I then go back and make sure 01/02/17 was parsed DMY, or MDY if I later found 01/31/17?
I tried to do a wrapper around dateutil.parser but I could not find a way to get any format information back (even just that the format was ambiguous and so the default order was used)
Somethings that I have attempted: 1)
DATETIMEFORMAT = '%m/%d/%y %H:%M'
df[DATETIME] = pd.to_datetime(df[DATETIME],format=DATETIMEFORMAT)
where I have to go in and manually change DATETIMEFORMAT for each file after looking at the file.
2)
datecol = []
for idx in range(df.shape[0]):
tmpdate = parser.parse(df[DATETIME][idx])
datecol.append(pd.to_datetime(tmpdate))
df[DATETIME] = datecol
using python's dateutil.parser, this will handle most timestamps correctly but I can't set a default order that will hold for all files. I was working to get format information out when I came across
[https://stackoverflow.com/questions/53892450/get-the-format-in-dateutil-parse][1]
- a combination of the two
for idx in range(df.shape[0]):
tmpdate = parser.parse(df[DATETIME][idx])
if not is_ambiguous(tmpdate):
formatstr = get_format_str(df,tmpdate,idx)
break
df[DATETIME] = pd.to_datetime(df[DATETIME],format=formatstr)
where I look for an unambiguous timestamp and try to use that to recreate the format string for pd.to_datetime().