2

I have a pandas dataframe which I have read from JSON, one date column is a strange timestamp format like follows

"/Date(1405961743000+0100)/"

. How can I convert the entire column into a python date?

I have been able to manually convert that date into a python date by using the datetime fromtimestamp function on the first 10 digits i.e. datetime.datetime.fromtimestamp(1405961743) but am struggling to convert the whole column.

I am guessing I need to select the appropriate digits from each entry, convert to integer, then use the fromtimestamp function, but I am new to python (and pandas) so am struggling to do this.

Any help would be appreciated.

Thanks

EdChum
  • 376,765
  • 198
  • 813
  • 562
user1165199
  • 6,351
  • 13
  • 44
  • 60

3 Answers3

1

Obviously it would be better if you know where the JSON comes from, and can look in the docs/ask the author/etc. to know what the actual intention is behind that date format. (It might even be generated by Python code, using a library you can just use yourself…)

But looking at the numbers, I can make a pretty good guess what this means: the 1405961743000 is milliseconds since the Unix epoch (which explains why you can use the first 10 digits of it as seconds since the Unix epoch, at least within a pretty wide range around 2014), and the +0100 is a timezone offset from GMT, in +HHMM format.

So, instead of extracting the first 10 digits, converting to int, and calling fromtimestamp, you'd want to extract everything up to the + or -, convert to int, divide by 1000, and call fromtimestamp. Although the fact that the only example you've given us happens to have 0 milliseconds implies that there's a good chance they all will, in which case this difference won't matter…

Anyway, it's then up to you what to do with the timezone offset. Do you want to store aware local datetimes? GMT datetimes? naive local datetimes? They're all pretty easy to get to from a timestamp and an offset (although "aware" is going to mean using a fake timezone like GMT-05:00, which doesn't have any historical or DST information, of course), but you have to decide which one you want.


Whatever you end up doing, you may want to consider extending your JSON decoder to automate it, as shown in the examples in the docs. (Any string that matches the regex r'/Date\((\d+)([+-]\d{4})\)/', the first group is the timestamp and the second the offset.)

But maybe not. Especially since parse_string doesn't seem to be overridable, at least as of 3.4, so it looks like you'd have to monkeypatch it. See this code I slapped together as a proof of concept; you might be able to make it a little nicer, but there's a limit to how clean you can make it if they didn't provide a hook for it…


PS, if you're ever extending JSON yourself, you may want to consider a more standardized and self-documenting way of doing this. The dict format shown in the json module docs, where you effectively specify a constructor to call and the arguments to pass it, is a lot easier for people to figure out (and to add a hook for). Or, alternatively, there's a quasi-standard way of encoding YAML formats as JSON formats, and YAML is extensible (and already has a standard timestamp extension).

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 3
    The format is from older versions of ASP.NET or from WCF applications. It isn't the best format, but it is well known. It has been replaced with ISO8601 in most contexts, but it still used by a lot of code. – Matt Johnson-Pint Oct 23 '14 at 01:40
  • @MattJohnson: Thanks! Handy information. Knowing that should make it easier to search for existing code that parses these things, instead of having to guess at it and reinvent the wheel… – abarnert Oct 23 '14 at 01:44
  • `fromtimestamp()` without specifying timezone may be wrong if the local UTC offset doesn't correspond to the one in the time string. Here's [how to get an aware datetime object given this time format](http://ideone.com/4n33fj). It looks like my code is also wrong, the offset is number of minutes (i.e., it is *not* HHMM). – jfs Oct 24 '14 at 04:44
  • on the other hand [the docs](http://www.odata.org/documentation/odata-version-2-0/json-format) can lie, all examples I see use HHMM format instead of "minutes". – jfs Oct 24 '14 at 05:14
  • @J.F.Sebastian: Are those the right docs for this format? If not, it's not particularly surprising that two different standards for extending JSON to new types via special string formats would come up with very similar but not quite identical ways of doing it. For example, the dict format used by jsonpickle and by YAML-in-JSON are often identical, but sometimes not. (Which is exactly why having docs or a person to talk to or code is always better than trying to guess from a small sample of the format…) – abarnert Oct 24 '14 at 18:29
  • @J.F.Sebastian: Also, the right way to get an appropriate datetime from this depends on what he wants and what assumptions he can make, as I explained in this answer (and also on what version of Python he has and whether he's willing to use third-party libs); without any of that info, I don't know that writing code that he (or another searcher) may use without thinking about is a great idea, which is why I buried it in a pastebin. – abarnert Oct 24 '14 at 18:33
  • To be clear: 1. using fromtimestamp() without specifying timezone + replace(tzinfo) as you do is wrong if the local utc offset is not the same as the offset in the input date – it does not depend on Python version or anything else 2. The format is referenced by name in other places that interpret it as HHMM I.e., it is much more likely (I'd say, I'm certain) that it is an error in the docs. – jfs Oct 24 '14 at 22:57
  • @J.F.Sebastian: `datetime.timezone` doesn't exist before Python 3.2, so it very definitely does depend on the Python version. So, what do you do before 3.2? That depends. If you're reading local values, you can avoid the problem by creating a naive datetime and then awarifying it. If you're not, things are more complicated, but don't write the more complicated code if you can use 3.2+ or you know you have local times. – abarnert Oct 25 '14 at 00:22
  • @J.F.Sebastian: And that's all assuming you want an aware datetime in the first place. If your ultimate goal is to get GMT or original-local or my-local datetimes, then, while you _can_ do that by going through an aware datetime back to a different naive one, it may not be the easiest way—again, especially if you don't have 3.2+. That's why it's important to know what you're trying to do before you write the code to do it. And in this case, I have no idea what the OP is trying to do, which Python version he's using, or anything else relevant. So all I can do is explain, not write full code. – abarnert Oct 25 '14 at 00:24
  • @J.F.Sebastian: Finally, based on [this blog post](http://weblogs.asp.net/bleroy/dates-and-json) by one of the authors, it looks like this format was invented by Microsoft Ajax Library, well before OData was even a thing. It looks like OData was a later attempt to standardize some things that everyone had been doing ad-hoc (just until JSON Schema makes this all unnecessary any day now…), but most ASP.NET code is probably not doing it because of OData, so their docs don't matter, it's MSDN's (or the third-party JSON.NET, which is apparently what a lot of sites use). – abarnert Oct 25 '14 at 00:31
  • @abarnert: 1. your answer is wrong due to incorrect `fromtimestamp()` usage -- that part doesn't depend on Python version. It is not about naive vs. aware -- the result is wrong 2. It is trivial to [implement FixedOffset() tzinfo on earlier Python versions](http://stackoverflow.com/a/23122493/4279) – jfs Oct 25 '14 at 01:40
1

The time string is OData version 2 JSON verbose format for Datetime:

“/Date(<ticks>[“+” | “-” <offset>])/”
<ticks> = number of milliseconds since midnight Jan 1, 1970
<offset> = number of minutes to add or subtract

As @Matt Johnson mentions the format may be seen in ASP.NET or WCF applications.

#!/usr/bin/env python3
import re
from datetime import datetime, timedelta, timezone

time_string = "/Date(1405961743000+0100)/"
epoch = datetime(1970, 1, 1, tzinfo=timezone.utc)
ticks, offset = re.match(r'/Date\((\d+)([+-]\d{4})?\)/$', time_string).groups()
utc_dt = epoch + timedelta(milliseconds=int(ticks))
print(utc_dt, utc_dt.strftime('%Z'))
if offset:
   offset = int(offset)
   # http://www.odata.org/documentation/odata-version-2-0/json-format
   # says offset is minutes (an error?)
   dt = utc_dt.astimezone(timezone(timedelta(minutes=offset)))
   print(dt, dt.strftime('%Z'))
   # but it looks like it could be HHMM
   hours, minutes = divmod(abs(offset), 100)
   if offset < 0:
      hours, minutes = -hours, -minutes
   dt = utc_dt.astimezone(timezone(timedelta(hours=hours, minutes=minutes)))
   print(dt, dt.strftime('%Z'))

Output

2014-07-21 16:55:43+00:00 UTC+00:00
2014-07-21 18:35:43+01:40 UTC+01:40
2014-07-21 17:55:43+01:00 UTC+01:00

It looks like the odata.org docs should be ignored and the offset should be treated as HHMM format.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

I recently had to solve the same problem. Fortunately, I found this post and managed to adapt the solution to format all the dates in a JSON file, coming from an OData-based API of MS Project Online. Thank you very much, @abarnert and @jfs! Here's the code I used:

    from datetime import datetime, timedelta, timezone
    import re


    def ms_to_timestamp(
        milliseconds: str, 
        offset: str, 
        date_format: str = "%Y-%m-%d %H:%M:%S"
    ) -> str:
        epoch = datetime(1970, 1, 1, tzinfo=timezone.utc)
        utc_dt = epoch + timedelta(milliseconds=int(milliseconds))
        timestamp = utc_dt.strftime(date_format)

        if offset:
            offset = int(offset)
            hours, minutes = divmod(abs(offset), 100)

            if offset < 0:
                hours, minutes = -hours, -minutes

            dt = utc_dt.astimezone(timezone(timedelta(hours=hours, minutes=minutes)))
            timestamp = dt.strftime(date_format)

        return timestamp


    def format_datetime(string: str) -> str:
        dates = re.findall(r'Date\((\d+)([+-]\d{4})?\)', string)

        for d in dates:
            timestamp = ms_to_timestamp(milliseconds=d[0], offset=d[1])
            string = string.replace(f"/Date({d[0]}{d[1]})/", timestamp)

        return string


    if __name__ == '__main__':
        text = "example 1: /Date(1615885200000)/ ; example 2: /Date(1405961743000+0100)/"
        new_text = format_datetime(text)
        print(f"Before: {text}\nAfter: {new_text}")

    # Output
    # ------
    # Before: example 1: /Date(1615885200000)/ ; example 2: /Date(1405961743000+0100)/
    # After: example 1: 2021-03-16 09:00:00 ; example 2: 2014-07-21 17:55:43
joao8tunes
  • 81
  • 5