I am currently working to Cythonize a single class within a larger python based simulation. The class of interest is called "Bin" and primarily serves as a data container, but does have a few methods as well. Initial tests have been promising and my class behaves exactly the same as its pure python version, with one major exception. When multiple instances of "Bin" exist, each data member has its own unique memory address, as does each Bin instance but any and every buffer type I've tried ends up sharing a memory address BETWEEN Bin instances.
By way of illustration:
>>> import Bin
>>> x = Bin.Bin("b1")
>>> y = Bin.Bin("b2")
>>> x.assign_job(1,2)
True
>>> x.get_jobs()
array([2], dtype=int64)
>>> y.assign_job(3,4)
True
>>> y.get_jobs()
array([4], dtype=int4)
>>> x.get_jobs()
array([4], dtype=int64)
So despite calling methods on separate instances somehow the underlying memory is being copied or shared. This is really bizarre, but only has one real probable culprit: the buffers share a memory address. To confirm that this was the case I continued with:
>>> id(x) == id(y)
False
>>> id(x.name) == id(y.name)
False
>>> id(x.get_jobs()) == id(y.get_jobs())
True
This does not seem to be an instance of python simply reusing a memory address as described in this question Why do different methods of same object have the same `id`?. I come from a C background however, so I'm not 100% sure how python garbage collection behaves. Perhaps a meaningful example related to my code could be beneficial in illustrating that python is in fact simply reusing a memory address - as I am not convinced that is what is happening here.
consider the following additional tests:
>>> x = Bin.Bin()
>>> y = Bin.Bin()
>>> x.assign_job(1,2)
>>> a = x.get_jobs()
>>> a
array([2], dtype=int64)
>>> y.assign_job(3,4)
>>> b = y.get_jobs()
>>> id(a) == id(b)
False
>>> b
array([4], dtype=int64)
>>> a
array([4], dtype=int64)
My question is why in the world would this be happening?
I remember that while reading through the cython docs that we aren't supposed to be able to "cdef" numpy arrays, so perhaps my issues is simply that I'm dealing in a land of undefined behavior, but regardless I need to figure this out so I can continue moving forward. I have look through google and stackoverflow for a day and a half and found exactly nothing describing a similar issue. I have also tried using memoryviews with the same result.
As a note, I am not married to NumPy in this context. I would prefer to stick with it primarily because NumPy arrays are going to be used in the python script that creates instances of Bin and secondarily because its a lot less work than manually managing memory.
Condensed version of Bin class:
import numpy as np
cimport numpy as np
cdef class Bin:
""" Container used to hold jobs. Can be seen as a server or data center """
cdef public int capacity, consumed, reserved, jobCount, jobCapacity
cdef public str name
cdef np.ndarray job_ids
cdef np.ndarray jobs
def __init__(self, str name="", int capacity=1000, int consumed=0, int reserved=0, int jobCount=0, int jobCapacity=40, np.ndarray job_ids=np.full(40, -1, dtype=int), np.ndarray jobs=np.full(40, -1, dtype=int)):
self.name = name
self.capacity = capacity
self.consumed = consumed
self.reserved = reserved
self.jobCount = jobCount
self.jobCapacity = jobCapacity
self.job_ids = job_ids
self.jobs = jobs
def __reduce__(self):
return (self.__class__, (self.name, self.capacity, self.consumed, self.reserved, self.jobCount, self.jobCapacity, self.job_ids, self.jobs))
def assign_job(self, int jobid, int job):
return self._assign_job(jobid, job)
cdef bint _assign_job(self, int jobid, int job):
if self.consumed + self.reserved + job <= self.capacity:
if self.jobCount == self.jobCapacity:
self.jobs = np.concatenate((self.jobs, np.full(self.jobCapacity, -1, dtype=int)), axis=0)
self.job_ids = np.concatenate((self.job_ids, np.full(self.jobCapacity, -1, dtype=int)), axis=0)
self.jobs[self.jobCount] = job
self.job_ids[self.jobCount] = jobid
self.jobCount += 1
self.consumed += job
return True
else:
return False
def get_jobs(self):
return self._getJobs()
cdef np.ndarray _getJobs(self):
cdef np.ndarray[np.int_t, ndim=1, mode='c'] yobs
yobs = self.jobs
if yobs != None:
return yobs[:self.jobCount]
else:
return yobs