0

How to convert all the string dates to Timestamp while loading a JSON?

The solution should work for keys and list of values as well. Solutions mentioned over here How to convert to a Python datetime object with JSON.loads? only work for string values and don't convert all the cases for the example json string given below:

{
    "single_date": "10/02/2020",
    "list_of_dates": ["10/03/2020", "10/04/2020"],
    "10/05/2020": "date_as_key",
    "nested":{
        "single_date": "10/02/2020",
        "list_of_dates": ["11/03/2020", "11/04/2020"],
        "11/05/2020": "date_as_key"
    }
}

Expected Output:

{
    'single_date': Timestamp('2020-10-02 00:00:00'),
    'list_of_dates': [Timestamp('2020-10-03 00:00:00'), Timestamp('2020-10-04 00:00:00')],
    Timestamp('2020-10-05 00:00:00'): 'date_as_key',
    'nested': {
        'single_date': Timestamp('2020-10-02 00:00:00'),
        'list_of_dates': [Timestamp('2020-11-03 00:00:00'), Timestamp('2020-11-04 00:00:00')],
        Timestamp('2020-11-05 00:00:00'): 'date_as_key'
    }
}

Update:

In the output Timestamp is the pandas timestamp but other datetime objects would work too. So, alternate output can also be:

{
    'single_date': datetime.datetime(2020, 10, 2, 0, 0),
    'list_of_dates': [datetime.datetime(2020, 10, 3, 0, 0),datetime.datetime(2020, 10, 4, 0, 0)],
    datetime.datetime(2020, 10, 5, 0, 0): 'date as key',
    'nested': {
        'single_date': datetime.datetime(2020, 10, 2, 0, 0),
        'list_of_dates': [datetime.datetime(2020, 11, 3, 0, 0),datetime.datetime(2020, 11, 4, 0, 0)],
        datetime.datetime(2020, 11, 5, 0, 0): 'date as key'
    }
}
Chayan Bansal
  • 1,857
  • 1
  • 13
  • 23
  • Is your data strongly typed? Perhaps you would prefer to work with classes instead of dict objects? – rv.kvetch Mar 05 '22 at 15:30
  • 1
    No, actually the schema of the data is also not fixed. So using a single class with attributes won't work. – Chayan Bansal Mar 05 '22 at 16:03
  • 1
    What is `Timestamp`? This isn't a standard python class. – Code-Apprentice Mar 05 '22 at 16:06
  • agreed, can you explain more what a `Timestamp` is? If you're looking to just get a string value that looks like that, you can just use some string manipulation techniques, should be pretty straightforward enough. – rv.kvetch Mar 05 '22 at 18:56
  • 1
    The idea was to convert the string to any usable datetime like object. I have updated the question with more details. Thanks! – Chayan Bansal Mar 05 '22 at 19:34

1 Answers1

2

The simplest (and most performant) option without using external libraries like pandas would be to define a custom hook to use along with json.loads:

from datetime import date


def date_hook(obj):
    t = type(obj)

    if t is dict:
        return {date_hook(k): date_hook(v) for k, v in obj.items()}
    if t is list:
        return [date_hook(elem) for elem in obj]
    if t is str:
        # best effort to check if it matches date format
        if '/' in obj and len(obj) == 10:
            # noinspection PyBroadException
            try:
                # format: %m/%d/%Y
                month, day, year = map(int, obj.split('/', 2))
                return date(year, month, day)
            except Exception:
                pass
        return obj
    return obj

Now do a quick test by loading sample data:

from json import loads
from pprint import pprint


string = """
{
    "single_date": "10/02/2020",
    "list_of_dates": ["10/03/2020", "10/04/2020"],
    "10/05/2020": "date_as_key",
    "nested":{
        "single_date": "10/02/2020",
        "list_of_dates": ["11/03/2020", "11/04/2020"],
        "11/05/2020": "date_as_key"
    }
}
"""

pprint(loads(string, object_hook=date_hook))

Result:

{datetime.date(2020, 10, 5): 'date_as_key',
 'list_of_dates': [datetime.date(2020, 10, 3), datetime.date(2020, 10, 4)],
 'nested': {datetime.date(2020, 11, 5): 'date_as_key',
            'list_of_dates': [datetime.date(2020, 11, 3),
                              datetime.date(2020, 11, 4)],
            'single_date': datetime.date(2020, 10, 2)},
 'single_date': datetime.date(2020, 10, 2)}

If curious, this actually comes out as 100x faster than an approach with the pandas library.

n = 1000
print('datetime.date:  ', timeit('loads(string, object_hook=date_hook)', globals=globals(), number=n))
print('pandas:         ', timeit('loads(string, object_hook=pd_date_hook)', globals=globals(), number=n))

Results on my PC (Windows):

datetime.date:   0.017586400000000002
pandas:          1.8995370999999999
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53