Json Encoder AND Decoder for complex numpy arrays

Question

I'm trying to JSON encode a complex numpy array, and I've found a utility from astropy (http://astropy.readthedocs.org/en/latest/_modules/astropy/utils/misc.html#JsonCustomEncoder) for this purpose:

import numpy as np

class JsonCustomEncoder(json.JSONEncoder):
    """ <cropped for brevity> """
    def default(self, obj):
        if isinstance(obj, (np.ndarray, np.number)):
            return obj.tolist()
        elif isinstance(obj, (complex, np.complex)):
            return [obj.real, obj.imag]
        elif isinstance(obj, set):
            return list(obj)
        elif isinstance(obj, bytes):  # pragma: py3
            return obj.decode()
        return json.JSONEncoder.default(self, obj)

This works well for a complex numpy array:

test = {'some_key':np.array([1+1j,2+5j, 3-4j])}

As dumping yields:

encoded = json.dumps(test, cls=JsonCustomEncoder)
print encoded
>>> {"some key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}

The problem is, I don't a way to read this back into a complex array automatically. For example:

json.loads(encoded)
>>> {"some_key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}

Can you guys help me figure out the way to overwrite loads/decoding so that it infers that this must be a complex array? I.E. Instead of a list of 2-element items, it should just put these back into a complex array. The JsonCustomDecoder doesn't have a default() method to overwrite, and the docs on encoding have too much jargon for me.

score 9 · Accepted Answer · edited May 23 '17 at 12:03

Here is my final solution that was adapted from hpaulj's answer, and his answer to this thread: https://stackoverflow.com/a/24375113/901925

This will encode/decode arrays that are nested to arbitrary depth in nested dictionaries, of any datatype.

import base64
import json
import numpy as np

class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        """
        if input object is a ndarray it will be converted into a dict holding dtype, shape and the data base64 encoded
        """
        if isinstance(obj, np.ndarray):
            data_b64 = base64.b64encode(obj.data)
            return dict(__ndarray__=data_b64,
                        dtype=str(obj.dtype),
                        shape=obj.shape)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder(self, obj)


def json_numpy_obj_hook(dct):
    """
    Decodes a previously encoded numpy ndarray
    with proper shape and dtype
    :param dct: (dict) json encoded ndarray
    :return: (ndarray) if input was an encoded ndarray
    """
    if isinstance(dct, dict) and '__ndarray__' in dct:
        data = base64.b64decode(dct['__ndarray__'])
        return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
    return dct

# Overload dump/load to default use this behavior.
def dumps(*args, **kwargs):
    kwargs.setdefault('cls', NumpyEncoder)
    return json.dumps(*args, **kwargs)

def loads(*args, **kwargs):
    kwargs.setdefault('object_hook', json_numpy_obj_hook)    
    return json.loads(*args, **kwargs)

def dump(*args, **kwargs):
    kwargs.setdefault('cls', NumpyEncoder)
    return json.dump(*args, **kwargs)

def load(*args, **kwargs):
    kwargs.setdefault('object_hook', json_numpy_obj_hook)
    return json.load(*args, **kwargs)

if __name__ == '__main__':

    data = np.arange(3, dtype=np.complex)

    one_level = {'level1': data, 'foo':'bar'}
    two_level = {'level2': one_level}

    dumped = dumps(two_level)
    result = loads(dumped)

    print '\noriginal data', data
    print '\nnested dict of dict complex array', two_level
    print '\ndecoded nested data', result

Which yields output:

original data [ 0.+0.j  1.+0.j  2.+0.j]

nested dict of dict complex array {'level2': {'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), 'foo': 'bar'}}

decoded nested data {u'level2': {u'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), u'foo': u'bar'}}

I think for more recent versions of Python 3, you need to change `json.JSONEncoder` to `json.JSONEncoder.default`. — Brian, Jul 08 '19 at 21:53

score 6 · Answer 2 · edited May 23 '17 at 12:18

The accepted answer is great but has a flaw. It only works if your data is C_CONTIGUOUS. If you transpose your data, that will not be true. For example, test the following:

A = np.arange(10).reshape(2,5)
A.flags
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False
A = A.transpose()
#array([[0, 5],
#       [1, 6],
#       [2, 7],
#       [3, 8],
#       [4, 9]])
loads(dumps(A))
#array([[0, 1],
#       [2, 3],
#       [4, 5],
#       [6, 7],
#       [8, 9]])
A.flags
# C_CONTIGUOUS : False
# F_CONTIGUOUS : True
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False

To fix this, use 'np.ascontiguousarray()' when passing the object to the b64encode. Specifically, change:

data_b64 = base64.b64encode(obj.data)

TO:

data_b64 = base64.b64encode(np.ascontiguousarray(obj).data)

If I understand the function correctly, it takes no action if your data is already C_CONTIGUOUS so the only performance hit is when you have F_CONTIGUOUS data.

Thanks also - I had been doing `base64.b64encode(obj.copy(order='C'))` — jtlz2, Aug 17 '18 at 10:12

score 1 · Answer 3 · edited May 23 '17 at 10:32

It's unclear just how much help you need with json encoding/decoding, or with working with numpy. For example, how did you create the complex array in the first place?

What your encoding has done is render the array as a list of lists. The decoder than has to convert that back to an array of the appropriate dtype. For example:

d = json.loads(encoded)
a = np.dot(d['some_key'],np.array([1,1j]))
# array([ 1.+1.j,  2.+5.j,  3.-4.j])

This isn't the only way to create such an array from this list, and it probably fails with more general shapes, but it's a start.

The next task is figuring out when to use such a routine. If you know you are going to receive such an array, then just do this decoding.

Another option is to add one or more keys to the dictionary that mark this variable as a complex nparray. One key might also encode its shape (though that is also deducible from the nesting of the list of lists).

Does this point in the right direction? Or do you need further help with each step?

One of the answers to this 'SimpleJSON and NumPy array' question

https://stackoverflow.com/a/24375113/901925

handles both the encoding and decoding of numpy arrays. It encodes a dictionary with the dtype and shape, and the array's data buffer. So the JSON string does not mean much to a human. But does handle general arrays, including ones with complex dtype.

expected and dump prints are:

[ 1.+1.j  2.+5.j  3.-4.j]

{"dtype": "complex128", "shape": [3], 
    "__ndarray__": "AAAAAAAA8D8AAAAAAADwPwAAAAAAAABAAAAAAAAAFEAAAAAAAAAIQAAAAAAAABDA"}

The custom decoding is done with an object_hook function, which takes a dict and returns an array (if possible).

json.loads(dumped, object_hook=json_numpy_obj_hook)

Following that model, here's a crude hook that would transform every JSON array into a np.array, and every one with 2 columns into a 1d complex array:

def numpy_hook(dct):
    jj = np.array([1,1j])
    for k,v in dct.items():
        if isinstance(v, list):
            v = np.array(v)
            if v.ndim==2 and v.shape[1]==2:
                v = np.dot(v,jj)
            dct[k] = v
    return dct

It would be better, I think, to encode some dictionary key to flag a numpy array, and another to flag a complex dtype.

I can improve the hook to handle regular lists, and other array dimensions:

def numpy_hook(dct):
    jj = np.array([1,1j])
    for k,v in dct.items():
        if isinstance(v, list):
            # try to turn list into numpy array
            v = np.array(v)
            if v.dtype==object:
                # not a normal array, don't change it
                continue
            if v.ndim>1 and v.shape[-1]==2:
                # guess it is a complex array
                # this information should be more explicit
                v = np.dot(v,jj)
            dct[k] = v
    return dct

It handles this structure:

A = np.array([1+1j,2+5j, 3-4j])
B = np.arange(12).reshape(3,4)
C = A+B.T
test = {'id': 'stream id',
        'arrays': [{'A': A}, {'B': B}, {'C': C}]}

returning:

{u'arrays': [{u'A': array([ 1.+1.j,  2.+5.j,  3.-4.j])}, 
       {u'B': array([[ 0,  1,  2,  3],
                     [ 4,  5,  6,  7],
                     [ 8,  9, 10, 11]])}, 
       {u'C': array([[  1.+1.j,   6.+5.j,  11.-4.j],
                     [  2.+1.j,   7.+5.j,  12.-4.j],
                     [  3.+1.j,   8.+5.j,  13.-4.j],
                     [  4.+1.j,   9.+5.j,  14.-4.j]])}], 
 u'id': u'stream id'}

Any more generality requires, I think, modifications to the encoding to make the array identity explicit.

Thanks! I will play with your solution and read more carefully, but to followup on some questions you asked: We're trying to serialize the results of a large simulation program, so depending on a ton of input parameters, which arrays are serialized will change dymacially. Therefore, I can't really predict when a complex array will need deserialized, that's why I need the decoder to just kind of work. It does look like your hook suggestion is probably what I need. Let me play with it, then I will come back and accept answer. — Adam Hughes, Jan 13 '15 at 17:26
Let me also clarify that I'm probably going to have a lot of dictionaries of dictionaries where the array is buried kind of deep down. For example, if I wanted to access my arrays from the data I might do: simulation1.layer1.material3.earray simulation2.optics.reflectance_array So whatever the best solution, it would need to desereialize a deeply nested dictionary where the arrays are buried at various levels. That's why I'd hoped the decoder would just be able to magically figure it out :) — Adam Hughes, Jan 13 '15 at 17:33
Then your encoding needs to clearly identify the objects that should be transformed back to arrays and complex arrays. — hpaulj, Jan 13 '15 at 19:19
Actually, I was able to get the behavior working fine from your answer in the link you attached. I put an edit to the answer you posted to show how I implemented your suggestion from that other thread, but it's under review. — Adam Hughes, Jan 13 '15 at 19:22
You can post your own answer. That's probably better than making a significant change to mine. — hpaulj, Jan 13 '15 at 20:08

score 1 · Answer 4 · answered May 11 '18 at 13:44

Try traitschema https://traitschema.readthedocs.io/en/latest/

"Create serializable, type-checked schema using traits and Numpy. A typical use case involves saving several Numpy arrays of varying shape and type."

See to_json()

"This uses a custom JSON encoder to handle numpy arrays but could conceivably lose precision. If this is important, please consider serializing in HDF5 format instead"

Json Encoder AND Decoder for complex numpy arrays

4 Answers4

Linked