2

How do you take a date, formatted randomly, and figure out what that format is?

I'm reading columns in files, and one of them is a date. However, there are many different formats I encounter: here are examples.

strdate = ["6/4/2014 12:55:36 AM", "2014-06-04 6:27:55 PM"], etc.

The best (fastest) way to parse the date appears to be dateutil.parser.parse(). However, I ALSO want to know the format of the string read in each file, so next time I encounter a date in a similar file, I don't have to use parse(). This should all be automatic; the file is read, a date pops out, and the format is saved so the next time a similar file is read, I can use datetime with a specified format.

How do you retrieve the format of a date?

ehacinom
  • 8,070
  • 7
  • 43
  • 65
  • I'm not sure I understand what you're trying to ask for – James Mertz Aug 05 '14 at 18:40
  • 1
    Sounds like he wants to use `dateutil.parser.parse()` one time, determine the specific format that the date used, and then directly use that format for another, faster date-processing function? – TheSoundDefense Aug 05 '14 at 18:43
  • 1
    reimplement parser.parse to return the format that it successfully used to parse the string ... you can download the source ... but its likely non-trivial – Joran Beasley Aug 05 '14 at 18:43
  • 2
    You could possibly have a function that tries a list of `strptime` formats, and `try`/`except` each in turn...until it succeeds, then return the format string from that. (similar to [this answer](http://stackoverflow.com/questions/23581128/how-to-format-date-string-via-multiple-formats-in-python/23581184#23581184) but returning the format string instead of the date) – Jon Clements Aug 05 '14 at 18:44
  • 1
    are you sure this is even a bottle-neck in your code? – Joran Beasley Aug 05 '14 at 18:44
  • 4
    Note that it's actually impossible to tell whether this is June 4th or April 6th. – Adam Smith Aug 05 '14 at 18:47
  • yeah, it is impossible to tell. This isn't necessarily a bottleneck, it's okay, I just want to write specification files about directory I'm searching. @JoranBeasley, that's what I was thinking about the source, but I don't want to try. – ehacinom Aug 05 '14 at 18:51
  • @jon-clements, that's good - it'll take some time now but not be hard. I think I'll start on that – ehacinom Aug 05 '14 at 18:51
  • I wrote a little date parser several months ago to go with an app I worked on that needed to be able to discern the correct date (always in month-first formatting) from user input. It was non-trivial, but I was able to get it to accept a huge variety of inputs. It's not impossible :) – Adam Smith Aug 05 '14 at 18:53
  • 1
    @rebecca if I were you, I'd start out by grabbing each "piece" of the puzzle, then introspecting to find out what part it is. Grab the input, split it on whitespace, look for something with colons and assume that's a time, look for something with slashes or dashes and assume it's a date, see if there's "AM" or "PM" next to the time, etc. Then look at what you've marked as your "Time" and see if you have something like "11:00" or "11:00:00", check your date out to see what the separator is, etc, then build your format string from your introspection – Adam Smith Aug 05 '14 at 18:58
  • @AdamSmith, this would definitely be the right way :P I'm going to temporarily do it with a try except loop, and if there's more than, say, 10 formats, I'll do it the right way. – ehacinom Aug 05 '14 at 19:05

1 Answers1

2

Here's an example that uses the try/except mechanism that returns a formatter convenience function that can be called on subsequent attempts:

from datetime import datetime

strdate = ["6/4/2014 12:55:36 AM", "2014-06-04 6:27:55 PM"]

def get_date_format(text):
    date_formats = (
        '%d/%m/%Y %H:%M:%S %p',
        '%Y-%m-%d %H:%M:%S %p'
    ) # <--- adjust as necessary
    for date_format in date_formats:
        try:
            datetime.strptime(text, date_format)
            return lambda L: datetime.strptime(L, date_format)
        except ValueError:
            pass
    raise ValueError("No suitable formats found")

for item in strdate:
    formatter = get_date_format(item)
    print formatter(item)

# 2014-04-06 12:55:36
# 2014-06-04 06:27:55
Jon Clements
  • 138,671
  • 33
  • 247
  • 280