3

I pass large scipy.sparse arrays to a parallel processes on shared memory of one computing node. In each round of parallel jobs, the passed array will not be modified. I want to pass the array with zero-copy.

While this is possible with multiprocessing.RawArray() and numpy.sharedmem (see here), I am wondering how ray's put() works.

As far as I understood (see memory management, [1], [2]), ray's put() copies the object once and for all (serialize, then de-serialize) to the object store that is available for all processes.

Question:

I am not sure I understood it correctly, is it a deep copy of the entire array in the object store or just a reference to it? Is there a way to "not" copy the object at all? Rather, just pass the address/reference of the existing scipy array? Basically, a true shallow copy without the overhead of copying the entire array.


Ubuntu 16.04, Python 3.7.6, Ray 0.8.5.

Sia
  • 207
  • 2
  • 9
  • This can be also helpful to understand how zero-copy read works https://docs.ray.io/en/master/serialization.html?highlight=zero%20copy#serialization – Sang May 22 '20 at 22:33
  • `scipy.sparse` matrix is not a `ndarray` subclass. It's a custom Python class, or rather classes. Different formats have different classes and data storage attributes. One is actually a dictionary subclass, the others store the data (and indices) in several `ndarrays`. – hpaulj May 23 '20 at 03:33
  • @hpaulj that's no problem I can pass components of scipy sparse array like non-zero data and their indices as ndarray. – Sia May 23 '20 at 04:15

0 Answers0