2

I'm looking for some sort of paradigm or implementation to efficiently handle many sets of coupled N-dim arrays (ndarrays). Specifically, I'm hoping for an implementation that allows me to slice an array of entire objects (e.g. someObjs = objects[100:200]), or individual attributes of those objects (e.g. somePars1 = objects.par1[100:200]) --- at the same time.

To expand on the above example, I could construct the following subsets in two ways:

def subset1(objects, beg, end):
    pars1 = [ obj.par1 for obj in objects[beg:end] ]
    pars2 = [ obj.par2 for obj in objects[beg:end] ]
    return pars1, pars2

def subset2(objects, beg, end):
    pars1 = objects.par1[beg:end]
    pars2 = objects.par2[beg:end]
    return pars1, pars2

And they would be identical.


Edit:

One approach would be to override the __getitem__ (etc) methods, something like,

class Objects(object):
    def __init__(self, p1, p2):
        self.par1 = p1
        self.par2 = p2
    ...
    def __getitem__(self, key):
        return Objects(self.p1[key], self.p2[key])

But this is horribly inefficient, and it duplicates the subset. Perhaps there's someway to return a view of the subset??

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • 1
    I don't quite understand the question. Are you trying to find a language that allows you to place the index in either position? This is antithetical to the structure of most languages. If you have a list of objects, the subscript expression *must* be applied directly to the list, not to the element. Your code is correct either way, depending on how you design your objects. However, you cannot have this dual nature in a language that honours type characteristics. – Prune Sep 28 '15 at 21:24
  • @Prune, I don't think that is the case. See the example I added. Achieving this functionality is certainly possible --- but I can't think of any way of doing it effectively/efficiently. – DilithiumMatrix Sep 28 '15 at 21:33
  • I understand now; thanks. Do keep in mind that this is inherently inefficient: you're accepting the natural structure, but then overlaying an artificial structure on that. Every reference to the artificial structure -- the view that you want -- requires dismantling and rearranging elements of the "correct" organization. However, a view pattern would likely be the way to go for maintainability. I don't know whether this tells you anything new; I'm likely just reinforcing what you feared. – Prune Sep 28 '15 at 21:41
  • @Prune, sure - thanks. It definitely might be a lost-cause, but perhaps there's something not ***too*** inefficient. – DilithiumMatrix Sep 28 '15 at 21:42
  • 1
    I think this can be a good approach, actually. One thing to keep in mind is that slicing of numpy arrays always returns a view (as long as you use a slice and not "fancy indexing" with a list/tuple). Therefore, your example actually doesn't duplicate that much memory at all, provided you enforce contiguous slices. However, the usual caveats with sharing views apply: If you modify one, you're modifying all and you need to be aware of what makes copies and what doesn't when using things. – Joe Kington Sep 28 '15 at 21:56
  • @JoeKington thanks! Can you give (or point me towards) more information on assuring that I only use views instead of duplication? I think that would be enough for an `answer`. – DilithiumMatrix Sep 28 '15 at 22:00
  • 1
    The numpy documentation goes over it in quite a bit of detail: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing Not to plug one of my own answers too much, but you might find this useful: http://stackoverflow.com/questions/4370745/view-onto-a-numpy-array/4371049#4371049 It specifically deals with how to avoid copies and ensure views. I'm too short on time at the moment for a full answer, so if someone wants to condense things and write one up, please feel free! – Joe Kington Sep 28 '15 at 22:05

1 Answers1

2

Object array and object with array approach

A sample object class

In [56]: class MyObj(object):
   ....:     def __init__(self, par1,par2):
   ....:         self.par1=par1
   ....:         self.par2=par2

An array of those objects - little more than a list with an array wrapper

In [57]: objects=np.array([MyObj(1,2),MyObj(3,4),MyObj(2,3),MyObj(10,11)])
In [58]: objects
Out[58]: 
array([<__main__.MyObj object at 0xb31b196c>,
       <__main__.MyObj object at 0xb31b116c>,
       <__main__.MyObj object at 0xb31b13cc>,
       <__main__.MyObj object at 0xb31b130c>], dtype=object)

`subset`` type of selection:

In [59]: [obj.par1 for obj in objects[1:-1]]
Out[59]: [3, 2]

Another class that can contain such an array. This is simpler than defining an array subclass:

In [60]: class MyObjs(object):
   ....:     def __init__(self,anArray):
   ....:         self.data=anArray
   ....:     def par1(self):
   ....:         return [obj.par1 for obj in self.data]

In [61]: Obs = MyObjs(objects)
In [62]: Obs.par1()
Out[62]: [1, 3, 2, 10]

subset2 type of selection:

In [63]: Obs.par1()[1:-1]
Out[63]: [3, 2]

For now par1 is a method, but could made a property, permitting Obs.par1[1:-1] syntax.

If par1 returned an array instead of a list, indexing would be more powerful.

If MyObjs had a __getitem__ method, then it could be indexed with

Obs[1:-1]

That method could be defined in various ways, though the simplest is to apply the indexing 'slice' to the 'data':

def __getitem__(self, *args):
    # not tested
    return MyObjs(self.data.__getitem(*args))

I'm focusing just on syntax, not on efficiency. In general numpy arrays of general objects is not very fast or powerful. Such arrays are basically lists of pointers to the objects.

Structured array and recarray version

Another possiblity is np.recarray. Another poster was just asking about their names. They essentially are structured array where fields can be accessed as attributes.

With a structured array definition:

In [64]: dt = np.dtype([('par1', int), ('par2', int)])
In [66]: Obj1 = np.array([(1,2),(3,4),(2,3),(10,11)], dtype=dt)
In [67]: Obj1
Out[67]: 
array([(1, 2), (3, 4), (2, 3), (10, 11)], 
      dtype=[('par1', '<i4'), ('par2', '<i4')])
In [68]: Obj1['par1'][1:-1]
Out[68]: array([3, 2])
In [69]: Obj1[1:-1]['par1']
Out[69]: array([3, 2])

or as recarray

In [79]: Objrec=np.rec.fromrecords(Obj1,dtype=dt)
In [80]: Objrec.par1
Out[80]: array([ 1,  3,  2, 10])
In [81]: Objrec.par1[1:-1]
Out[81]: array([3, 2])
In [82]: Objrec[1:-1].par1
Out[82]: array([3, 2])
hpaulj
  • 221,503
  • 14
  • 230
  • 353