0

I am currently working to Cythonize a single class within a larger python based simulation. The class of interest is called "Bin" and primarily serves as a data container, but does have a few methods as well. Initial tests have been promising and my class behaves exactly the same as its pure python version, with one major exception. When multiple instances of "Bin" exist, each data member has its own unique memory address, as does each Bin instance but any and every buffer type I've tried ends up sharing a memory address BETWEEN Bin instances.

By way of illustration:

>>> import Bin
>>> x = Bin.Bin("b1")
>>> y = Bin.Bin("b2")
>>> x.assign_job(1,2)
True
>>> x.get_jobs()
array([2], dtype=int64)
>>> y.assign_job(3,4)
True
>>> y.get_jobs()
array([4], dtype=int4)
>>> x.get_jobs()
array([4], dtype=int64)

So despite calling methods on separate instances somehow the underlying memory is being copied or shared. This is really bizarre, but only has one real probable culprit: the buffers share a memory address. To confirm that this was the case I continued with:

>>> id(x) == id(y)
False
>>> id(x.name) == id(y.name)
False
>>> id(x.get_jobs()) == id(y.get_jobs())
True 

This does not seem to be an instance of python simply reusing a memory address as described in this question Why do different methods of same object have the same `id`?. I come from a C background however, so I'm not 100% sure how python garbage collection behaves. Perhaps a meaningful example related to my code could be beneficial in illustrating that python is in fact simply reusing a memory address - as I am not convinced that is what is happening here.

consider the following additional tests:

>>> x = Bin.Bin()
>>> y = Bin.Bin()
>>> x.assign_job(1,2)
>>> a = x.get_jobs()
>>> a
array([2], dtype=int64)
>>> y.assign_job(3,4)
>>> b = y.get_jobs()
>>> id(a) == id(b)
False
>>> b
array([4], dtype=int64)
>>> a
array([4], dtype=int64)

My question is why in the world would this be happening?

I remember that while reading through the cython docs that we aren't supposed to be able to "cdef" numpy arrays, so perhaps my issues is simply that I'm dealing in a land of undefined behavior, but regardless I need to figure this out so I can continue moving forward. I have look through google and stackoverflow for a day and a half and found exactly nothing describing a similar issue. I have also tried using memoryviews with the same result.

As a note, I am not married to NumPy in this context. I would prefer to stick with it primarily because NumPy arrays are going to be used in the python script that creates instances of Bin and secondarily because its a lot less work than manually managing memory.

Condensed version of Bin class:

import numpy as np
cimport numpy as np

cdef class Bin:
    """ Container used to hold jobs. Can be seen as a server or data center """

    cdef public int capacity, consumed, reserved, jobCount, jobCapacity
    cdef public str name
    cdef np.ndarray job_ids
    cdef np.ndarray jobs

def __init__(self, str name="", int capacity=1000, int consumed=0, int reserved=0, int jobCount=0, int jobCapacity=40, np.ndarray job_ids=np.full(40, -1, dtype=int), np.ndarray jobs=np.full(40, -1, dtype=int)):
    self.name = name
    self.capacity = capacity
    self.consumed = consumed
    self.reserved = reserved
    self.jobCount = jobCount
    self.jobCapacity = jobCapacity
    self.job_ids = job_ids
    self.jobs = jobs

def __reduce__(self):
    return (self.__class__, (self.name, self.capacity, self.consumed, self.reserved, self.jobCount, self.jobCapacity, self.job_ids, self.jobs))

def assign_job(self, int jobid, int job):
    return self._assign_job(jobid, job)

cdef bint _assign_job(self, int jobid, int job):
    if self.consumed + self.reserved + job <= self.capacity:
        if self.jobCount == self.jobCapacity:
            self.jobs = np.concatenate((self.jobs, np.full(self.jobCapacity, -1, dtype=int)), axis=0)
            self.job_ids = np.concatenate((self.job_ids, np.full(self.jobCapacity, -1, dtype=int)), axis=0)
        self.jobs[self.jobCount] = job
        self.job_ids[self.jobCount] = jobid
        self.jobCount += 1
        self.consumed += job
        return True
    else:
        return False

def get_jobs(self):
    return self._getJobs()

cdef np.ndarray _getJobs(self):
    cdef np.ndarray[np.int_t, ndim=1, mode='c'] yobs
    yobs = self.jobs
    if yobs != None:
            return yobs[:self.jobCount]
    else:
            return yobs
Community
  • 1
  • 1
MS-DDOS
  • 578
  • 5
  • 15
  • Specifically, `jobs=np.full(40, -1, dtype=int)` in `__init__`. – g.d.d.c Mar 28 '16 at 22:30
  • I don't think this is the problem...see my edit. However if I'm just missing it, how can I ensure that each instance of the class maintains its own persisted array? – MS-DDOS Mar 28 '16 at 22:51
  • 1
    @TylerS, yes, my mistake, read your code incorrectly. – Padraic Cunningham Mar 28 '16 at 22:56
  • 1
    You need to create your instances of your numpy arrays inside your `__init__`, not as a default argument to the function. So instead of `jobs=np.full(40, -1, dtype=int)` you'd want `jobs = None` and then `if jobs is None: self.jobs=np.full(40, -1, dtype=int)` within the actualy function body. – g.d.d.c Mar 28 '16 at 22:58
  • Yup! Just fixed it after reading the "least astonishment" article. Thanks-a-million y'all! – MS-DDOS Mar 28 '16 at 23:03

0 Answers0