0

This thread discusses options of serializing numpy arrays preserving their shape and dtype. While the most upvoted answer works, it is 5x slower than using np.tobytes and np.from_buffer. Is there a faster option?

import io
import numpy

x = np.ones((500,500,3), dtype=np.uint32)

def encode(x) -> str:
    return x.tobytes()

def decode(s):
    return np.frombuffer(s, dtype=np.uint32).reshape(500,500,3)


# Keeps dtype and shape!
def encode2(x: np.ndarray) -> str:
    memfile = io.BytesIO()
    np.save(memfile, x)
    return memfile.getvalue()


def decode2(s: str) -> np.ndarray:
    memfile = io.BytesIO()
    memfile.write(s)
    memfile.seek(0)
    return np.load(memfile)

%%timeit

decode(encode(x)) # 0.193 ms per loop
# decode2(encode2(x))  # 1.09 ms per loop
gebbissimo
  • 2,137
  • 2
  • 25
  • 35
  • 1
    The array you want to encode/decode takes `500*500*3*4=2.86 MiB` in memory. This might be enough to be store in the L3 cache regarding the platform. Otherwise it should be in the RAM. The decode+encode needs to copy 2.86 MiB twice assuming there is no zero-copy optimization (I highly doubt this is possible if the target array needs to be modified because it would be very dangerous). This means a throughput of `2.86*2*2/0.193e-3/1024 = 57.9 GiB/s`. This is actually very good for a sequential code and certainly optimal if a copy is needed (especially if this is on mainstream PC -- probably not) – Jérôme Richard Mar 10 '23 at 15:37

0 Answers0