1

The problem

Whe defining different objects in cython, the memoryviews will return the same address. However, the array itself will get modified when indexed into.

Background.

I have base class and derived class written in cython. I noticed that when I applied multiprocessing to the classes, the underlying buffers were altered in different processess, which was not intended. During the pickling procedure I wrote a simple __reduce__ method and __deepcopy__ method that rebuilds the original object. For sake of clarity I reduced the complexity to the code below. Now my question is, why do the memoryviews return the same address? Additionally, why are the numpy array itself altered correctly even though the memoryview is the same

#distutils: language=c++
import numpy as np
cimport numpy as np
cdef class Temp:
    cdef double[::1] inp
    def __init__(self, inp):
        print(f'id of inp = {id(inp)}')
        self.inp = inp

cdef np.ndarray x = np.ones(10)
cdef Temp a       = Temp(x)
cdef Temp b       = Temp(x)
cdef Temp c       = Temp(x.copy())
b.inp[0] = -1
c.inp[2] = 10
print(f'id of a.inp = {id(a.inp)}\nid of b.inp = {id(b.inp))}\nid of c.inp = {id(c.inp)}')
print(f'id of a.inp.base = {id(a.inp.base)}\nid of b.inp.base = {id(b.inp.base))}\nid of c.inp.base = {id(c.inp.base)}')

print('a.inp.base',a.inp.base)
print('b.inp.base',b.inp.base) # expected to be the same as a
print('c.inp.base',c.inp.base) # expected to be different to a/b

Output:

id of inp = 139662709551872
id of inp = 139662709551872
id of inp = 139662709551952
id of a.inp = 139662450248672
id of b.inp = 139662450248672
id of c.inp = 139662450248672
id of a.inp.base = 139662709551872
id of b.inp.base = 139662709551872
id of c.inp.base = 139662709551952
a.inp.base [-1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
b.inp.base [-1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
c.inp.base [ 1.  1. 10.  1.  1.  1.  1.  1.  1.  1.]
Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
cvanelteren
  • 1,633
  • 9
  • 16

1 Answers1

2

What we call typed memory view isn't a single class: Depending on the context (Cython code, pure Python code) it changes its identity under the hood.

So let's start with

%%cython 
cdef class Temp:
    cdef double[::1] inp

Here double[::1] inp is of type __Pyx_memviewslice which isn't a Python object:

typedef struct {
  struct {{memview_struct_name}} *memview;
  char *data;
  Py_ssize_t shape[{{max_dims}}];
  Py_ssize_t strides[{{max_dims}}];
  Py_ssize_t suboffsets[{{max_dims}}];
} {{memviewslice_name}};

What happens when we call id(self.inp)? Obviously, id is a pure-Python function, so a new temporary python-object (a memoryview) must be created from self.inp (only to be able to call id) and destroyed directly afterwards. The creation of the temporary Python-object is done via __pyx_memoryview_fromslice.

Knowing that, it is easy to explain, why the ids are equal: despite being different objects, temporary memoryviews have coincidentally the same address (and thus the same id, which is an implementation detail of CPython), because the memory is reused over and over again by CPython.

There are similar scenarios all over in Python, here is an example for method-objects, or even a more simple one:

class A:
    pass
# the life times of temporary objects don't overlap, so the ids can be the equal
id(A())==id(A())
# output: True

# the life times of objects overlap, so the id cannot be equal 
a,b=A(), A()
id(a)==id(b)
# output: False

So in a nutshell: your expectation, that the same id means the same object is wrong. This assumption only holds, when the life times of objects overlap.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Interesting. This explains my observation my observation of your first example. Didn't know about that memory implementation. Is there anyway to bind these memviews to the cdef class? For example with the cython.binding directive? Thanks again for answering! – cvanelteren Dec 12 '18 at 13:52
  • To clarify, it may not be an issue as the link and you are saying that they have no time overlap, but as these classes will be running in separate processes they may have time-overlap. Hence I suspect that the memviews (if they are pointing to the same address) will overwrite the data. Which is not intended. Based on the results I am getting from the simulations this seems to be the case. I could convert them to proper ndarrays but unfortunately that cuts down the performance by 1/3 when compared to single core run times. – cvanelteren Dec 12 '18 at 14:14
  • 1
    What @ead is saying (I think) is that when you call `id(memview)` it creates a temporary Python object _just for purpose of displaying "an address"_, and this is what gets printed. In terms of what data gets changed `id(memview.base)` is what's important, and this behaves as you expect. – DavidW Dec 12 '18 at 17:47
  • @GlobalTraveler Sorry, but I really understand what is your concern: Even if memviews themselves have the same address, they point to different buffers. – ead Dec 12 '18 at 19:03
  • Apologies for my lack of knowledge (python dev getting into compiling). I have this simulation code I wrote in python and now converted to cdef classes with proper typing. However, it uses a memview for fast access to the state of the simulation. However, when I run these in parallel using multiproccessing.Pool I notice that when the buffers are accessed in the different processes they all change which either implies that (a) my pickle/unpickle method does not work as intended or (b) the buffers are still pointing to the same object. One alternative that does work is rebuilding in each proc. – cvanelteren Dec 12 '18 at 19:32
  • I was trying to get it to work directly with prange few days ago but I have to face defeat in that I know too little of c/c++ to get that working in the near future. Multiprocessing is the other options but that causes these mutation issues that I can't really track down. – cvanelteren Dec 12 '18 at 19:35
  • @GlobalTraveler That goes beyond the scope of this answer and it is not possible to give a proper answer. However, when I'm not mistaken: on Linux the data is shared without pickling/unpickling (shared as in "all processes works with the same data") and on Windows it is pickled/unpickled but completely different processes are started so there will be no data exchange between the processes. – ead Dec 12 '18 at 19:54
  • Yeah I am aware of the copy on write feature. Ah well more digging into the subject I guess. Thanks for the answers though, will definitely do more with this in the future. – cvanelteren Dec 12 '18 at 20:02