How do you remove a column from a structured numpy array?

Question

Imagine you have a structured numpy array, generated from a csv with the first row as field names. The array has the form:

dtype([('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ..., ('n','<f8'])

Now, lets say you want to remove from this array the 'ith' column. Is there a convenient way to do that?

I'd like a it to work like delete:

new_array = np.delete(old_array, 'i')

Any ideas?

score 21 · Accepted Answer · edited May 23 '17 at 12:01

It's not quite a single function call, but the following shows one way to drop the i-th field:

In [67]: a
Out[67]: 
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0)], 
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

In [68]: i = 1   # Drop the 'B' field

In [69]: names = list(a.dtype.names)

In [70]: names
Out[70]: ['A', 'B', 'C']

In [71]: new_names = names[:i] + names[i+1:]

In [72]: new_names
Out[72]: ['A', 'C']

In [73]: b = a[new_names]

In [74]: b
Out[74]: 
array([(1.0, 3.0), (4.0, 6.0)], 
      dtype=[('A', '<f8'), ('C', '<f8')])

Wrapped up as a function:

def remove_field_num(a, i):
    names = list(a.dtype.names)
    new_names = names[:i] + names[i+1:]
    b = a[new_names]
    return b

It might be more natural to remove a given field name:

def remove_field_name(a, name):
    names = list(a.dtype.names)
    if name in names:
        names.remove(name)
    b = a[names]
    return b

Also, check out the drop_rec_fields function that is part of the mlab module of matplotlib.

Update: See my answer at How to remove a column from a structured numpy array *without copying it*? for a method to create a view of subsets of the fields of a structured array without making a copy of the array.

score 7 · Answer 2 · answered Jan 07 '16 at 03:38

7

Having googled my way here and learned what I needed to know from Warren's answer, I couldn't resist posting a more succinct version, with the added option to remove multiple fields efficiently in one go:

def rmfield( a, *fieldnames_to_remove ):
    return a[ [ name for name in a.dtype.names if name not in fieldnames_to_remove ] ]

Examples:

a = rmfield(a, 'foo')
a = rmfield(a, 'foo', 'bar')  # remove multiple fields at once

Or if we're really going to golf it, the following is equivalent:

rmfield=lambda a,*f:a[[n for n in a.dtype.names if n not in f]]

answered Jan 07 '16 at 03:38

jez

14,867
5
37
64

1

Your second solution is quite ugly if I may say so. In particular I don't like your use of a lambda expression for what is in effect a function declaration. It is not a good style and hard to read. Others seem to agree with me: http://stackoverflow.com/a/134638/1375015 – Konstantin Schubert May 06 '16 at 18:12
3

Perhaps you didn't read the phrase "if we're really going to golf it".... The aim of "code golf" is to create the shortest code irrespective of readability and that almost never fails to be ugly. – jez May 06 '16 at 18:48
1

I didn't know about that phrase. I still don't see the point, but in that context maybe my response was a bit harsh. – Konstantin Schubert May 06 '16 at 18:57
1

Check it out, it has its own stack at codegolf.stackexchange.com :-) – jez May 06 '16 at 18:58

How do you remove a column from a structured numpy array?

2 Answers2

Linked