5

I'm getting a date as a string, then I'm parsing it to datetime object. Is there any way to check what's is the date format of the object?

Let's say that this is the object that I'm creating:

modified_date = parser.parse("2015-09-01T12:34:15.601+03:00")

How can i print or get the exact date format of this object, i need this in order to verify that it's in the correct format, so I'll be able to to make a diff of today's date and the given date.

wim
  • 338,267
  • 99
  • 616
  • 750
Alex Brodov
  • 3,365
  • 18
  • 43
  • 66
  • This question doesn't make sense. If you're parsing it to a datetime object, then it's a datetime object, and doesn't have a format. What exactly do you think you need to compare? – Daniel Roseman Dec 03 '15 at 18:29
  • I'm getting a string and i want to make diff of days between today's date and the given string, but in order to perform the diff i have to make sure that the format of both of the object is the same, otherwise i'll get an exception – Alex Brodov Dec 03 '15 at 18:31
  • But you said you were converting it to a datetime object. What does `parser.parse` do? – Daniel Roseman Dec 03 '15 at 18:33
  • Do you want to verify that the string is in ISO format or what? I don't get it – Simone Zandara Dec 03 '15 at 18:35
  • The following code is not working: `modified_date = parser.parse("2015-09-01T12:34:15.601+03:00") today = datetime.today() diff_test = modified_date - today` I'm getting an exception: `TypeError: can't subtract offset-naive and offset-aware datetimes` It's probably related to the timezone, but i'm not sure – Alex Brodov Dec 03 '15 at 18:37
  • @DanielRoseman I believe that is python-dateutil parser – wim Dec 03 '15 at 18:43
  • `TypeError` is a different question (the answer: use timezone-aware datetime object for the current time e.g., [`datetime.now(utc)`](http://stackoverflow.com/a/25421145/4279)) – jfs Dec 04 '15 at 11:45

2 Answers2

6

I had a look in the source code and, unfortunately, python-dateutil doesn't expose the format. In fact it doesn't even generate a guess for the format at all, it just goes ahead and parses - the code is like a big nested spaghetti of conditionals.

You could have a look at dateinfer which looks to be what you're searching for, but these are unrelated libraries so there is no guarantee at all that python-dateutil will parse with the same format that dateinfer suggests.

>>> from dateinfer import infer
>>> s = "2015-09-01T12:34:15.601+03:00"
>>> infer([s])
'%Y-%d-%mT%I:%M:%S.601+%m:%d'

Look at that .601. Close but not cigar. I think it has probably also mixed up the month and the day. You might get better results by giving it more than one date string to base the guess upon.

wim
  • 338,267
  • 99
  • 616
  • 750
  • it seems like, the format isn't in the convention of datetime.. I've already created a method that returns the requested format. `def get_formatted_date(date_format, date_to_reformat): """ Reformatting a date object to a specific format :param date_format: String the desired format :param date_to_reformat: datetime The actual date :return: datetime The actual date """ date_str = date_to_reformat.strftime(date_format) return parser.parse(date_str) ` – Alex Brodov Dec 03 '15 at 19:00
4

i need this in order to verify that it's in the correct format

If you know the expected time format (or a set of valid time formats) then you could just parse the input using it: if it succeeds then the time format is valid (the usual EAFP approach in Python):

for date_format in valid_date_formats:
    try:
        return datetime.strptime(date_string, date_format), date_format
    except ValueError: # wrong date format
        pass # try the next format
raise ValueError("{date_string} is not in the correct format. "
                 "valid formats: {valid_date_formats}".format(**vars()))

Here's a complete code example (in Russian -- ignore the text, look at the code).

If there are many valid date formats then to improve time performance you might want to combine them into a single regular expression or convert the regex to a deterministic or non-deterministic finite-state automaton (DFA or NFA).

In general, if you need to extract dates from a larger text that is too varied to create parsing rules manually; consider machine learning solutions e.g., a NER system such as webstruct (for html input).

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • "if it succeeds then the time format is valid" <-- valid, ok, but not necessarily correct! – wim Jun 02 '17 at 06:01
  • @wim: what is the difference between "valid" and "correct" in this case. Could you provide an example of a `date_string` that is valid but that is not correct? – jfs Jun 15 '17 at 22:53
  • Yes. If you have "01-02-2017" then both "%d-%m-%Y" and "%m-%d-%Y" are valid, but only one is correct. More context is needed, e.g. locale information, or using multiple data points. – wim Jun 15 '17 at 23:41
  • it makes sense. My answer assumes that the format are sufficiently different e.g., formats in the link: `date_formats = '%B %d, %Y', '%b %d, %Y', '%Y-%B-%d'` (valid and correct are identical here). – jfs Jun 16 '17 at 00:02