2

I have a numpy array that contains a list of objects.

x = np.array([obj1,obj2,obj3])

Here is the definition of the object:

class obj():
    def __init__(self,id):
        self.id = id

obj1 = obj(6)
obj2 = obj(4)
obj3 = obj(2)

Instead of accessing the numpy array based on the position of the object, i want to access it based on the value of id.

For example:

# x[2] = obj3
# x[4] = obj2
# x[6] = obj1

After doing some research, I learned that i could make a structured array:

x = np.array([(3,2,1)],dtype=[('2', 'i4'),('4', 'i4'), ('6', 'i4')])

# x['2'] --> 3

However, the problem with this is that i want the array to take integers as indexes, and dtypes must have a name of type str. Furthermore, i don't think structured arrays can be lists of objects.

snowleopard
  • 717
  • 8
  • 19
  • 2
    Can you tell us more about how this will actually be used? Is the real production code going to only have three elements in the array? Or how many? Do they all have unique IDs? Why not just use a `dict` to map from `id` to `obj`? – John Zwinck Dec 15 '15 at 08:03
  • The array will eventually have 1mil + objects. All will have unique id's, I originally implemented it as dict. But eventually my goal was to use it as x[[val1,val2,val3,.....]] etc and return an array, and numpy arrays do a good job with this. – snowleopard Dec 15 '15 at 08:07
  • How is an array any better than list? What array functionality are you hoping to use? – hpaulj Dec 15 '15 at 08:45
  • How about using a sorted list of the ids? Or a `sqlite` database. – hpaulj Dec 15 '15 at 15:31
  • I wanted to use a numpy array because of the advantages described here: http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists?rq=1. I am essentially trying to create a numpy array subset from an already huge array. This subset will be accessed and manipulated. – snowleopard Dec 15 '15 at 20:40

2 Answers2

2

You should be able to use filter() here, along with a lambda expression:

np.array(filter(lambda o: o.id == 1, x))

However, as filter() returns a list (in Python 3+, it should return an iterator), you may want to generate a new np.array from the result.

But this does not take care of duplicate keys, if you want to access your data key-like. It is possible to have more than one object with the same id attribute. You might want to control uniqueness of keys.

jbndlr
  • 4,965
  • 2
  • 21
  • 31
  • Will this lookup be O(1)? Also there should be no duplicate keys – snowleopard Dec 15 '15 at 08:18
  • No, the lookup will be the code example as given in my post. If you want another lookup call, you have to wrap it into a function or embed it into a new class. – jbndlr Dec 15 '15 at 10:07
  • @jbndlr Your code is incorrect for python 3+ since you cannot create numpy array from generator (`np.array(filter(...))` will create an array with size `(1, )` with a generator in the first cell). – Holt Dec 15 '15 at 10:42
1

If you only want to be able to access subarrays "by-index" (e.g. x[2, 4]), with index as id, then you could simply create your own struct:

import collections    

class MyArray (collections.OrderedDict):
    def __init__ (self, values):
        super(MyArray, self).__init__ ((v.id, v) for v in values)
    def __rawgetitem (self, key):
        return super (MyArray, self).__getitem__ (key)
    def __getitem__ (self, key):
        if not hasattr (key, '__iter__'):
            key = (key, )
        return MyArray (self.__rawgetitem (k) for k in key)
    def __repr__ (self):
        return 'MyArray({})'.format(', '.join('{}: {}'.format(k, self.__rawgetitem(k)) for k in self.keys()))
>>> class obj():
...     def __init__(self,id):
...         self.id = id
...     def __repr__ (self):
...         return "obj({})".format(self.id)
...
>>> obj1 = obj(6)
>>> obj2 = obj(4)
>>> obj3 = obj(2)
>>> x = MyArray([obj1, obj2, obj3])
>>> x
MyArray({2: obj(2), 4: obj(4), 6: obj(6)})
>>> x[4]
obj(4) 
>>> x[2, 4]
MyArray({2: obj(2), 4: obj(4)})
Holt
  • 36,600
  • 7
  • 92
  • 139
  • Nice workaround, this was my original way of tackling the problem. However order does matter for me. So the x that was printed out should be 6,4,2 – snowleopard Dec 15 '15 at 20:10
  • 1
    @snowleopard Then simply use `collections.OrderedDict` instead of `dict`, see my updated answer. – Holt Dec 15 '15 at 20:23
  • Is there a way to make the output always be MyArray (ie even for case where key is not __iter__), i tried and get infinite recursion – snowleopard Dec 16 '15 at 01:14