Automatically resizing NumPy recarray

Question

I'd like to create a subclass of numpy.recarray that automatically resizes when data is added to a row outside of its current length.

The code below does most of what I want.

class autorecarray(numpy.recarray):

   def __init__(self,*args,**kwargs):
      self._increment = 1
      numpy.recarray.__init__(self,args,kwargs)

   def __setitem__(self,ind,y):
      try: 
         numpy.recarray.__setitem__(self,ind,y)
      except IndexError:
         self.resize((self.__len__()+self._increment,),refcheck=False)
         self.__setitem__(ind,y)

It works fine for this use case:

a = utils.autorecarray((1,),formats=['i4','i4'])
a[1] = (1,2) # len(a) will now be 2

However, this usage will raise an IndexError on numpy.core.records.recarray __getitem__ method:

a[2]['f1'] = 3

My initial attempt was to also override the __getitem__ method in my subclass, but this code does not work.

def __getitem__(self,ind):
      try:
         numpy.recarray.__getitem__(self,ind)
      except IndexError:
         self.resize((self.__len__() + self._increment,),refcheck=False)
         self.__getitem__(ind)

It does automatically expand the array, but now every item in the array is None and cannot be changed.

Can anyone tell me what I'm doing wrong?

score 3 · Accepted Answer · answered Jun 19 '11 at 23:00

First of all you're missing the asterisks in the numpy.recarray.__init__ call:

def __init__(self, *args, **kwargs):
    self._increment = 1
    numpy.recarray.__init__(self, *args, **kwargs)

And second, you're missing return statements in the __getitem__:

def __getitem__(self,ind):
    try:
        return numpy.recarray.__getitem__(self,ind)
    except IndexError:
        self.resize((self.__len__() + self._increment,),refcheck=False)
        return self.__getitem__(ind)

score 2 · Answer 2 · edited May 23 '17 at 12:20

2

Your overridden __getitem__ doesn't return a value.

It took me a scarily long time to realize that.

Also, as Petr Viktorin points out, you've left out the * and ** operators in your __init__ call.

edited May 23 '17 at 12:20

Community

1
1

answered Jun 19 '11 at 22:59

senderle

145,869
36
209
233

@bellamyj, I should add that I don't necessarily recommend this approach. What exactly are you trying to do? – senderle Jun 19 '11 at 23:13
I have a few applications where I store output in a recarray, but don't know ahead of time how many rows are needed. To avoid violating the DRY principle, this seemed like a better solution than checking the array size and expanding as necessary in many locations throughout my code. – joshayers Jun 20 '11 at 00:33
Thinking about it more though, overriding getitem in this way is dangerous. If you were to retrieve an element that's outside of the array's bounds - rather than write to it as in my example above - you would get incorrect results. I'll stick with only overriding setitem. – joshayers Jun 20 '11 at 00:37
@bellamyj, ok, I get your point. Does that mean you're just _appending_ to the array, rather than inserting a new value at `493` even though the array is only `15` items log? For the latter, I would suggest a sparse data structure. But if you're just appending, why not allocate memory in chunks? Rather than resizing the array by one item at a time, extend it by, say, 20%. Then when the room runs out, extend it again... you could even write a separate append method that would do this, rather than overriding `__setitem__` to have an unexpected side-effect. – senderle Jun 20 '11 at 01:11
Correct, I'm just appending to the array, so it won't be sparse. And I am actually increasing the array size in chunks. I have another method where I can change _increment to a value other than one. I just left it out of the code above for the sake of brevity. – joshayers Jun 20 '11 at 01:54

Automatically resizing NumPy recarray

2 Answers2

Linked