2

I wrote a class that would allow me to add days (integers) to dates (string %Y-%m-%d). The objects of this class need to be JSON serializable.

Adding days in the form of integers to my objects works as expected. However json.dumps(obj) returns too much info ("2016-03-23 15:57:47.926362") for my original object. Why ? How would I need to modify the class to get ""2016-03-23" instead ? Please see the example below.

Code:

from datetime import datetime, timedelta
import json

class Day(str):
    def __init__(self, _datetime):
        self.day = _datetime

    def __str__(self):
        return self.day.date().isoformat()

    def __repr__(self):
        return "%s" % self.day.date().isoformat()

    def __add__(self, day):
        new_day = self.day + timedelta(days=day)
        return Day(new_day).__str__()

    def __sub__(self, day):
        new_day = self.day - timedelta(days=day)
        return Day(new_day).__str__()


if __name__ == "__main__":
    today = Day(datetime.today())
    print(today)               # 2016-03-23
    print(json.dumps(today))   # "2016-03-23 15:57:47.926362"
    print(today+1)             # 2016-03-24
    print(json.dumps(today+1)) # "2016-03-24"
    print(today-1)             # 2016-03-22
    print(json.dumps(today-1)) # "2016-03-22"

Update. Here's my final code for those interested:

from datetime import datetime, timedelta
import json


class Day(str):
    def __init__(self, datetime_obj):
        self.day = datetime_obj

    def __new__(self, datetime):
        return str.__new__(Day, datetime.date().isoformat())

    def __add__(self, day):
        new_day = self.day + timedelta(days=day)
        return Day(new_day)

    def __sub__(self, day):
        new_day = self.day - timedelta(days=day)
        return Day(new_day)


if __name__ == "__main__":
    today = Day(datetime.today())
    print(type(today))
    print(today)  # 2016-03-23
    print(json.dumps(today))  # "2016-03-23"
    print(today + 1)  # 2016-03-24
    print(json.dumps(today + 1))  # "2016-03-24"
    print(today - 1)  # 2016-03-22
    print(json.dumps(today - 1))  # "2016-03-22"
    print(json.dumps(dict(today=today))) # {"today": "2016-03-23"}
    print(json.dumps(dict(next_year=today+365))) # {"next_year": "2017-03-23"}
    print(json.dumps(dict(last_year=today-366))) # {"last_year": "2015-03-23"}
Jakub Czaplicki
  • 1,787
  • 2
  • 28
  • 50

2 Answers2

5

Cool! Let's go with it. You are seeing:

print(json.dumps(today))   # "2016-03-23 15:57:47.926362"

Because somewhere in the encoding process, when deciding how to serialize what was passed to it, json.dumps calls isinstance(..., str) on your object. This returns True and your object is serialized like this string it secretly is.

But where does the "2016-03-23 15:57:47.926362" value come from?

When you call day = Day(datetime_obj), two things happen:

  • __new__ is called to instantiate the object. You haven't provided a __new__ method, so str.__new__ is used.
  • __init__ is called to initialize the object.

So day = Day(datetime_obj) effectively translates to:

day = str.__new__(Day, datetime_obj)

For json.dumps, your object will be a str, but the value of the str is set to the default string representation of datetime_obj. Which happens to be the full format you are seeing. Builtins, man!

I played around with this, and it seems if you roll your own __new__ (which is slightly exciting territory, tread carefully) which intercepts the str.__new__ call, you ~~should~~ be fine:

class Day(str):
    def __new__(self, datetime):
        return str.__new__(Day, datetime.date().isoformat())

But you didn't hear it from me if the whole thing catches fire.

PS The proper way would be to subclass JSONEncoder. But there is zero fun in it.

PS2 Oh, shoot, I tested this on 2.7. I may be completely out there, and if I am, just give me a "you tried" badge.

Community
  • 1
  • 1
maligree
  • 5,939
  • 10
  • 34
  • 51
  • 1
    You're a star! I did experiment with __new__ but without much luck. You had one mistake in the suggested code, which I edited. Your answer is indeed a great explanation. Thanks! – Jakub Czaplicki Mar 23 '16 at 21:51
1

The reason for the json.dumps(today)'s behavior is not as obvious as it might appear at the first glance. To understand the issue, you should be able to answer two questions:

  • where does the string value that includes the time come from?
  • why Day.__str__ is not called by json encoder ? Should it?

Here're some prerequisites:

  1. datetime.today() method is similar to datetime.now() -- it includes the current time (hour, minutes, etc). You could use date.today(), to get only date.

  2. str creates immutable objects in Python; its value is set in the __new__ method that you have not overriden and therefore the default conversion str(datetime.today()) is used to initialize Day's value as a string. It creates the string value that includes both date and time in your case. You could override __new__, to get a different string value:

    def __new__(cls, _datetime):
        return str.__new__(cls, _datetime.date())
    
  3. Day is a str subclass and therefore its instances are encoded as JSON strings

  4. str methods return str objects instead of the corresponding subclass objects unless you override them e.g.:

    >>> class S(str):
    ...    def duplicate(self):
    ...        return S(self * 2)
    ...
    >>> s = S('abc')
    >>> s.duplicate().duplicate()
    'abcabcabcabc'
    >>> s.upper().duplicate()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'str' object has no attribute 'duplicate'
    

    s.upper() returns str object instead of S here and the following .duplicate() call fails.

In your case, to create the corresponding JSON string, json.dumps(today) performs an operation (re.sub() call in json.encode.encode_basestring()) on the today object that uses its value as a string i.e., the issue is that neither re.sub() nor encode_basestring() call __str__() method on instances of str subclasses. Even if encode_basestring(s) were as simple as return '"' + s + '"'; the result would be the same: '"' + today returns a str object and Day.__str__ is not called.

I don't know whether re module should call str(obj) in functions that accept isinstance(obj, str). Or whether json.encode.encode_basestring() should do it (or neither).

If you can't fix Day class; you could patch json.encode.encode_basestring() to call str(obj), to get a desirable JSON representation for str subtype instances (if you want to get the value returned by __str__() method -- putting aside whether it is wise to override __str__() on a str subclass in the first place):

import json

for suffix in ['', '_ascii']:
    function_name = 'encode_basestring' + suffix
    orig_function = getattr(json.encoder, function_name)
    setattr(json.encoder, function_name, lambda s,_e=orig_function: _e(str(s)))

Related Python issue: Cannot override JSON encoding of basic type subclasses

jfs
  • 399,953
  • 195
  • 994
  • 1,670