Faster way to serialize numpy arrays keeping shape and dtype?

Question

This thread discusses options of serializing numpy arrays preserving their shape and dtype. While the most upvoted answer works, it is 5x slower than using np.tobytes and np.from_buffer. Is there a faster option?

import io
import numpy

x = np.ones((500,500,3), dtype=np.uint32)

def encode(x) -> str:
    return x.tobytes()

def decode(s):
    return np.frombuffer(s, dtype=np.uint32).reshape(500,500,3)


# Keeps dtype and shape!
def encode2(x: np.ndarray) -> str:
    memfile = io.BytesIO()
    np.save(memfile, x)
    return memfile.getvalue()


def decode2(s: str) -> np.ndarray:
    memfile = io.BytesIO()
    memfile.write(s)
    memfile.seek(0)
    return np.load(memfile)

%%timeit

decode(encode(x)) # 0.193 ms per loop
# decode2(encode2(x))  # 1.09 ms per loop

The array you want to encode/decode takes `500*500*3*4=2.86 MiB` in memory. This might be enough to be store in the L3 cache regarding the platform. Otherwise it should be in the RAM. The decode+encode needs to copy 2.86 MiB twice assuming there is no zero-copy optimization (I highly doubt this is possible if the target array needs to be modified because it would be very dangerous). This means a throughput of `2.86*2*2/0.193e-3/1024 = 57.9 GiB/s`. This is actually very good for a sequential code and certainly optimal if a copy is needed (especially if this is on mainstream PC -- probably not) — Jérôme Richard, Mar 10 '23 at 15:37

Faster way to serialize numpy arrays keeping shape and dtype?

0 Answers0