5

I want to extend the structured array object in numpy such that I can easily add new elements.

For example, for a simple structured array

>>> import numpy as np
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]

I would like to easily add a new element x['C']=[5,6], but then an error appears associated to the undefined name 'C'.

Just adding a new method to np.ndarray works:

import numpy as np
class sndarray(np.ndarray):
    def column_stack(self,i,x):
        formats=['f8']*len(self.dtype.names)
        new=sndarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
        for key in self.dtype.names:
            new[key]=self[key]

        new[i]=x

        return new

Then,

>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x=x.column_stack('C',[4,4])
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)], 
  dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Is there any way that the new element could be added in a dictionary-like way?, e.g

>>> x['C']=[4,4]
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)], 
  dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Update:

By using __setitem__ I am still one step away from the ideal solution because I don't know how:

change the object referenced at self

import numpy as np

class sdarray(np.ndarray):
    def __setitem__(self, i,x):
    if i in self.dtype.names:
        super(sdarray, self).__setitem__(i,x)
    else:
        formats=['f8']*len(self.dtype.names)
        new=sdarray(shape=self.shape,dtype={'names':list(self.dtype.names)+[i],'formats':formats+['f8']})
        for key in self.dtype.names:
           new[key]=self[key]

        new[i]=x

        self.with_new_column=new

Then

>>> x=sndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x=x.with_new_column #extra uggly step!
>>> x
sndarray([(1.0, 3.0, 4.0), (2.0, 4.0, 4.0)], 
  dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Update 2 After the right implementation in the selected answer, I figure out that the problem is already solved by pandas DataFrame object:

>>> import numpy as np
>>> import pandas as pd
>>> x=np.ndarray((2,),dtype={'names':['A','B'],'formats':['f8','f8']})
>>> x=pd.DataFrame(x)
>>> x['A']=[1,2]
>>> x['B']=[3,4]
>>> x['C']=[4,4]
>>> x
   A  B  C
0  1  3  4
1  2  4  4
>>> 
restrepo
  • 479
  • 3
  • 14
  • Take a look at the [`__setitem__()`](http://www.diveintopython.net/object_oriented_framework/special_class_methods.html#fileinfo.specialmethods.setitem.example) magic method. – Gareth Latty Apr 17 '13 at 00:27
  • I try [this](https://gist.github.com/rescolo/c0c7dda3ea4d59f58958), `x['C']=[4,4]` was accepted but `x` itself was not updated – restrepo Apr 17 '13 at 00:38
  • `self = new` doesn't change the object referenced at `self`, it just changes the name `self` to point to `new`. – Gareth Latty Apr 17 '13 at 00:54

1 Answers1

4

Use numpy.recarrayinstead, in my numpy 1.6.1 you get an extra method field that does not exist when you subclass from numpy.ndarray.

This question or this one (if using numpy 1.3) also discuss adding a field to a structured array. From there you will see that using:

import numpy.lib.recfunctions as rf
rf.append_fields( ... )

can greatly simplify your life. At the first glance I thought this function would append to the original array, but it creates a new instance instead. The classshown below is using your solution for __setitem__(), which is working very well.

The issue you found that led you to the ugly solution was reported in another question. The problem is that when you do self=... you are just storing the newobject in a variable, but the entity sdarray is not being updated. Maybe it is possible to directly destroy and reconstruct the class from inside its method, but based on that discussion the following class can be created, in which ndarray is not subclassed, but stored and called internally. Some other methods were added to make it work and look like you are working directly with ndarray. I did not test it in detail.

For automatic resizing a good solution has been presented here. You can also incorporate in your code.

import numpy as np

class sdarray(object):
    def __init__(self, *args, **kwargs):
        self.recarray =  np.recarray( *args, **kwargs)

    def __getattr__(self,attr):
        if hasattr( self.recarray, attr ):
            return getattr( self.recarray, attr )
        else:
            return getattr( self, attr )

    def __len__(self):
        return self.recarray.__len__()

    def __add__(self,other):
        return self.recarray.__add__(other)

    def __sub__(self,other):
        return self.recarray.__sub__(other)

    def __mul__(self,other):
        return self.recarray.__mul__(other)

    def __rmul__(self,other):
        return self.recarray.__rmul__(other)

    def __getitem__(self,i):
        return self.recarray.__getitem__(i)

    def __str__(self):
        return self.recarray.__str__()

    def __repr__(self):
        return self.recarray.__repr__()

    def __setitem__(self, i, x):
        keys = []
        formats = []
        if i in self.dtype.names:
            self.recarray.__setitem__(i,x)
        else:
            for name, t in self.dtype.fields.iteritems():
                keys.append(name)
                formats.append(t[0])
            keys.append( i )
            formats.append( formats[-1] )
            new = np.recarray( shape = self.shape,
                              dtype = {'names'  : keys,
                                       'formats': formats} )
            for k in keys[:-1]:
                new[k] = self[k]
            new[i] = x
            self.recarray = new
Community
  • 1
  • 1
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234