43

What is the most efficient way of serializing a numpy array using simplejson?

epoch
  • 1,337
  • 2
  • 15
  • 23
  • [Related](http://stackoverflow.com/questions/11561932/why-does-json-dumpslistnp-arange5-fail-while-json-dumpsnp-arange5-tolis) and [simple solution](http://stackoverflow.com/questions/8230315/python-sets-are-not-json-serializable) by explicitly passing a [default handler](http://docs.python.org/2/library/json.html#json.dumps) for non-serializable objects. – 0 _ Aug 22 '13 at 05:39
  • Yet another answer here: http://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable/32850511#32850511 – travelingbones Sep 29 '15 at 18:36

9 Answers9

80

In order to keep dtype and dimension try this:

import base64
import json
import numpy as np

class NumpyEncoder(json.JSONEncoder):

    def default(self, obj):
        """If input object is an ndarray it will be converted into a dict 
        holding dtype, shape and the data, base64 encoded.
        """
        if isinstance(obj, np.ndarray):
            if obj.flags['C_CONTIGUOUS']:
                obj_data = obj.data
            else:
                cont_obj = np.ascontiguousarray(obj)
                assert(cont_obj.flags['C_CONTIGUOUS'])
                obj_data = cont_obj.data
            data_b64 = base64.b64encode(obj_data)
            return dict(__ndarray__=data_b64,
                        dtype=str(obj.dtype),
                        shape=obj.shape)
        # Let the base class default method raise the TypeError
        super(NumpyEncoder, self).default(obj)


def json_numpy_obj_hook(dct):
    """Decodes a previously encoded numpy ndarray with proper shape and dtype.

    :param dct: (dict) json encoded ndarray
    :return: (ndarray) if input was an encoded ndarray
    """
    if isinstance(dct, dict) and '__ndarray__' in dct:
        data = base64.b64decode(dct['__ndarray__'])
        return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
    return dct

expected = np.arange(100, dtype=np.float)
dumped = json.dumps(expected, cls=NumpyEncoder)
result = json.loads(dumped, object_hook=json_numpy_obj_hook)


# None of the following assertions will be broken.
assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.allclose(expected, result), "Wrong Values"
Community
  • 1
  • 1
tlausch
  • 1,000
  • 1
  • 7
  • 5
  • Agreed, this solution works in general for nested arrays, IE a dictionary of dictionary of arrays. http://stackoverflow.com/questions/27909658/json-encoder-and-decoder-for-complex-numpy-arrays/27913569#27913569 – Adam Hughes Jan 13 '15 at 17:49
  • Can you adopt this to work with recarrays? `dtype=str(obj.dtype)` truncates the list of the recarray dtype into a string, which cannot be correctly recovered upon reconstruction, without conversion to string (i.e. `dtype=obj.dtype`) I get a circular reference exception :-( – Marti Nito Oct 02 '15 at 14:29
  • This encodes the values of the array safely, which is good. However, if you want the values in the resulting JSON to be human-readable, you can consider leaving out the `base64` library and simply convert to list. One could do `data_json = cont_obj.tolist()` in the encoder, `np.array(dct['__ndarray__'], dct['dtype']).reshape(dct['shape'])` in the decoder. – Def_Os Oct 08 '15 at 00:26
  • Hey, this fails on python3 on "RuntimeError: maximum recursion depth exceeded". Does anyone know why? – GuySoft Oct 19 '15 at 12:02
  • @GuySoft I was getting the recursion error until I added the check for np.generic suggested by the answer from ankostis http://stackoverflow.com/a/21322890/3571110. – proximous Oct 26 '15 at 04:46
  • 3
    @Community This was edited for C_CONTIGUOUS similar to my answer to http://stackoverflow.com/a/29853094/3571110. When I looked at this, I thought np.ascontiguousarray() was a no op for C_CONTIGUOUS, making the if/else check unnecessary compared to simply always calling np.ascontiguousarray(). Am I correct? – proximous Oct 26 '15 at 05:15
  • ^ yep, you are correct. as stated in numpy doc http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ascontiguousarray.html it is C_CONTIGUOUS afterwards and has no effect if it was apriori. so obj_data = np.ascontiguousarray(obj) would also be fine. thanks for that hint. – tlausch Feb 12 '16 at 07:55
  • 3
    To fix the infinite recursion problem I changed `return json.JSONEncoder(self, obj)` to `super(JsonNumpy, self).default(obj)` – Turn Apr 05 '16 at 21:52
  • @Turn but what is JsonNumpy? – apatsekin Sep 22 '18 at 05:23
  • @apatsekin Cut-n-paste typo. Should be `NumpyEncoder`. – Turn Sep 22 '18 at 21:33
  • I encounter an issue with decoding the following dict: `{"-0.1186": {"__ndarray__": ` When falls through to the super statement, the code logs an error >NumpyEncoder fell through for type . 08April2019_10:02:11 GenerateControlTable.py ERROR: Exception writing JSON file: Inappropriate argument type. Traceback (most recent call last): TypeError: Object of type 'bytes' is not JSON serializable – Colin Helms Apr 08 '19 at 17:08
  • Numpy::Encoder is encoding the obj.data as data_b64. From documentation, base64.b64encode(s, altchars=None): Encode the bytes-like object s using Base64 and return the encoded bytes. Using Python 3.6. The encoder.py seems to expect a strfloat() conversion somewere in the procedure. – Colin Helms Apr 08 '19 at 18:32
  • `data_b64 = repr(base64.b64encode(np.ascontiguousarray(obj).data))` seems to work, not sure I can get the array elements back from the file with the hook as written. `{"-0.1186": {"__ndarray__": "b'mpmZmZmZuT/dtYR8...` – Colin Helms Apr 08 '19 at 18:42
  • 1
    Nope, `data = base64.b64decode(dct['__ndarray__'])` is not decoding the serialized bytes correctly. First three elements, `data[0] = 110, data[1] = 106, data[3] = 102`. Should be =0.1, =0.1004, =0.1009. – Colin Helms Apr 09 '19 at 06:53
28

I'd use simplejson.dumps(somearray.tolist()) as the most convenient approach (if I was still using simplejson at all, which implies being stuck with Python 2.5 or earlier; 2.6 and later have a standard library module json which works the same way, so of course I'd use that if the Python release in use supported it;-).

In a quest for greater efficiency, you could subclass json.JSONEncoder (in json; I don't know if the older simplejson already offered such customization possibilities) and, in the default method, special-case instances of numpy.array by turning them into list or tuples "just in time". I kind of doubt you'd gain enough by such an approach, in terms of performance, to justify the effort, though.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • JSONEncoder's default method must return a serializable object, so it will be the same as returning `somearray.tolist()`. If you want something more fast you have to encode it yourself element by element. – Marco Sulla Mar 04 '16 at 22:36
17

I found this json subclass code for serializing one-dimensional numpy arrays within a dictionary. I tried it and it works for me.

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray) and obj.ndim == 1:
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)

My dictionary is 'results'. Here's how I write to the file "data.json":

j=json.dumps(results,cls=NumpyAwareJSONEncoder)
f=open("data.json","w")
f.write(j)
f.close()
Charles L.
  • 5,795
  • 10
  • 40
  • 60
Russ
  • 171
  • 1
  • 2
  • 2
    This approach also works when you have a numpy array nested inside of a dict. This answer (I think) implied what I just said, but it's an important point. – Brad Campbell Jan 30 '13 at 19:59
  • 1
    This did not work for me. I had to use `return obj.tolist()` instead of `return [x for x in obj]`. – nwhsvc Jul 31 '13 at 00:04
  • I prefer using numpy's object to list - it should be faster to have numpy iterate through the list as opposed to having python iterate through. – Charles L. Jun 24 '14 at 21:56
  • what's the point of `and obj.ndim == 1` ? this works even without this constraint – Mike Vella Nov 07 '17 at 10:10
13

This shows how to convert from a 1D NumPy array to JSON and back to an array:

try:
    import json
except ImportError:
    import simplejson as json
import numpy as np

def arr2json(arr):
    return json.dumps(arr.tolist())
def json2arr(astr,dtype):
    return np.fromiter(json.loads(astr),dtype)

arr=np.arange(10)
astr=arr2json(arr)
print(repr(astr))
# '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
dt=np.int32
arr=json2arr(astr,dt)
print(repr(arr))
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Building on tlausch's answer, here is a way to JSON-encode a NumPy array while preserving shape and dtype of any NumPy array -- including those with complex dtype.

class NDArrayEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            output = io.BytesIO()
            np.savez_compressed(output, obj=obj)
            return {'b64npz' : base64.b64encode(output.getvalue())}
        return json.JSONEncoder.default(self, obj)


def ndarray_decoder(dct):
    if isinstance(dct, dict) and 'b64npz' in dct:
        output = io.BytesIO(base64.b64decode(dct['b64npz']))
        output.seek(0)
        return np.load(output)['obj']
    return dct

# Make expected non-contiguous structured array:
expected = np.arange(10)[::2]
expected = expected.view('<i4,<f4')

dumped = json.dumps(expected, cls=NDArrayEncoder)
result = json.loads(dumped, object_hook=ndarray_decoder)

assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.array_equal(expected, result), "Wrong Values"
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
4

I just discovered tlausch's answer to this Question and realized it gives the almost correct answer for my problem, but at least for me it does not work in Python 3.5, because of several errors: 1 - infinite recursion 2 - the data was saved as None

since i can not directly comment on the original answer yet, here is my version:

import base64
import json
import numpy as np

    class NumpyEncoder(json.JSONEncoder):
        def default(self, obj):
            """If input object is an ndarray it will be converted into a dict
            holding dtype, shape and the data, base64 encoded.
            """
            if isinstance(obj, np.ndarray):
                if obj.flags['C_CONTIGUOUS']:
                    obj_data = obj.data
                else:
                    cont_obj = np.ascontiguousarray(obj)
                    assert(cont_obj.flags['C_CONTIGUOUS'])
                    obj_data = cont_obj.data
                data_b64 = base64.b64encode(obj_data)
                return dict(__ndarray__= data_b64.decode('utf-8'),
                            dtype=str(obj.dtype),
                            shape=obj.shape)


    def json_numpy_obj_hook(dct):
        """Decodes a previously encoded numpy ndarray with proper shape and dtype.

        :param dct: (dict) json encoded ndarray
        :return: (ndarray) if input was an encoded ndarray
        """
        if isinstance(dct, dict) and '__ndarray__' in dct:
            data = base64.b64decode(dct['__ndarray__'])
            return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
        return dct

expected = np.arange(100, dtype=np.float)
dumped = json.dumps(expected, cls=NumpyEncoder)
result = json.loads(dumped, object_hook=json_numpy_obj_hook)


# None of the following assertions will be broken.
assert result.dtype == expected.dtype, "Wrong Type"
assert result.shape == expected.shape, "Wrong Shape"
assert np.allclose(expected, result), "Wrong Values"    
Luindil
  • 101
  • 3
  • The solution has worked for me by replacing `result = json.loads(dumped, object_hook=json_numpy_obj_hook)` by `result = json.load(dumped, object_hook=NumpyEncoder.json_numpy_obj_hook)` – YYY Feb 19 '21 at 13:53
3

If you want to apply Russ's method to n-dimensional numpy arrays you can try this

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, numpy.ndarray):
            if obj.ndim == 1:
                return obj.tolist()
            else:
                return [self.default(obj[i]) for i in range(obj.shape[0])]
        return json.JSONEncoder.default(self, obj)

This will simply turn a n-dimensional array into a list of lists with depth "n". To cast such lists back into a numpy array, my_nparray = numpy.array(my_list) will work regardless of the list "depth".

HerrIvan
  • 650
  • 4
  • 17
2

Improving On Russ's answer, I would also include the np.generic scalars:

class NumpyAwareJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray) and obj.ndim == 1:
                return obj.tolist()
        elif isinstance(obj, np.generic):
            return obj.item()
        return json.JSONEncoder.default(self, obj)
ankostis
  • 8,579
  • 3
  • 47
  • 61
2

You can also answer this with just a function passed into json.dumps in this way:

json.dumps(np.array([1, 2, 3]), default=json_numpy_serializer)

With

import numpy as np

def json_numpy_serialzer(o):
    """ Serialize numpy types for json

    Parameters:
        o (object): any python object which fails to be serialized by json

    Example:

        >>> import json
        >>> a = np.array([1, 2, 3])
        >>> json.dumps(a, default=json_numpy_serializer)

    """
    numpy_types = (
        np.bool_,
        # np.bytes_, -- python `bytes` class is not json serializable     
        # np.complex64,  -- python `complex` class is not json serializable  
        # np.complex128,  -- python `complex` class is not json serializable
        # np.complex256,  -- special handling below
        # np.datetime64,  -- python `datetime.datetime` class is not json serializable
        np.float16,
        np.float32,
        np.float64,
        # np.float128,  -- special handling below
        np.int8,
        np.int16,
        np.int32,
        np.int64,
        # np.object_  -- should already be evaluated as python native
        np.str_,
        np.timedelta64,
        np.uint8,
        np.uint16,
        np.uint32,
        np.uint64,
        np.void,
    )

    if isinstance(o, np.ndarray):
        return o.tolist()
    elif isinstance(o, numpy_types):        
        return o.item()
    elif isinstance(o, np.float128):
        return o.astype(np.float64).item()
    # elif isinstance(o, np.complex256): -- no python native for np.complex256
    #     return o.astype(np.complex128).item() -- python `complex` class is not json serializable 
    else:
        raise TypeError("{} of type {} is not JSON serializable".format(repr(o), type(o)))

validated:

need_addition_json_handeling = (
    np.bytes_,
    np.complex64,  
    np.complex128, 
    np.complex256, 
    np.datetime64,
    np.float128,
)


numpy_types = tuple(set(np.typeDict.values()))

for numpy_type in numpy_types:
    print(numpy_type)

    if numpy_type == np.void:
        # complex dtypes evaluate as np.void, e.g.
        numpy_type = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])
    elif numpy_type in need_addition_json_handeling:
        print('python native can not be json serialized')
        continue

    a = np.ones(1, dtype=nptype)
    json.dumps(a, default=json_numpy_serialzer)
The Doctor
  • 334
  • 3
  • 11
1

One fast, though not truly optimal way is using Pandas:

import pandas as pd
pd.Series(your_array).to_json(orient='values')
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • This only seems to work for 1D arrays, however `pd.DataFrame(your_array).to_json(orient='values')` seems to work for 2D arrays. – Martin Spacek Apr 17 '19 at 15:53