6

Given a structured numpy array, I want to remove certain columns by name without copying the array. I know I can do this:

names = list(a.dtype.names)
if name_to_remove in names:
    names.remove(name_to_remove)
a = a[names]

But this creates a temporary copy of the array which I want to avoid because the array I am dealing with might be very large.

Is there a good way to do this?

Konstantin Schubert
  • 3,242
  • 1
  • 31
  • 46
  • If you want to avoid using the "names" list, you may write a lambda function which does this operation. – Ani Menon May 06 '16 at 18:45
  • The problem is that `a[names]` creates a copy of the original array, assigns it to a and only then deletes the original array. I want to avoid that copy. Maybe I should clarify my question somehow? – Konstantin Schubert May 06 '16 at 18:59
  • You are talking about creation of the "names" list right? – Ani Menon May 06 '16 at 19:01
  • 1
    In general I don't think it's possible for much the same reason that you can't remove arbitrary rows or columns from a 2D numpy array without generating a copy. Structured numpy arrays are backed by contiguous blocks of memory, where elements in adjacent fields reside at adjacent addresses. If you wanted to remove an arbitrary field from the middle of the array, you would need to "shift over" the elements in all of the fields after it, which would require a copy. – ali_m May 06 '16 at 19:43
  • 1
    @ali_m: Actually the fields to not have to be adjacent. See my answer. – Warren Weckesser May 06 '16 at 21:28
  • @WarrenWeckesser Huh, I didn't know that you could specify arbitrary byte offsets for the fields. Very cool. – ali_m May 06 '16 at 21:36
  • While indexing an individual field gives a view, using the list of field names gives a copy, sort like regular indexing with a list of integers. But over all that mechanism is not as well developed. If you regularly need to access a group of fields with the same dtype, consider making them a single field with an array dimension. – hpaulj May 07 '16 at 02:55

1 Answers1

7

You can create a new data type containing just the fields that you want, with the same field offsets and the same itemsize as the original array's data type, and then use this new data type to create a view of the original array. The dtype function handles arguments with many formats; the relevant one is described in the section of the documentation called "Specifying and constructing data types". Scroll down to the subsection that begins with

{'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}

Here are a couple convenience functions that use this idea.

import numpy as np


def view_fields(a, names):
    """
    `a` must be a numpy structured array.
    `names` is the collection of field names to keep.

    Returns a view of the array `a` (not a copy).
    """
    dt = a.dtype
    formats = [dt.fields[name][0] for name in names]
    offsets = [dt.fields[name][1] for name in names]
    itemsize = a.dtype.itemsize
    newdt = np.dtype(dict(names=names,
                          formats=formats,
                          offsets=offsets,
                          itemsize=itemsize))
    b = a.view(newdt)
    return b


def remove_fields(a, names):
    """
    `a` must be a numpy structured array.
    `names` is the collection of field names to remove.

    Returns a view of the array `a` (not a copy).
    """
    dt = a.dtype
    keep_names = [name for name in dt.names if name not in names]
    return view_fields(a, keep_names)

For example,

In [297]: a
Out[297]: 
array([(10.0, 13.5, 1248, -2), (20.0, 0.0, 0, 0), (30.0, 0.0, 0, 0),
       (40.0, 0.0, 0, 0), (50.0, 0.0, 0, 999)], 
      dtype=[('x', '<f8'), ('y', '<f8'), ('i', '<i8'), ('j', '<i8')])

In [298]: b = remove_fields(a, ['i', 'j'])

In [299]: b
Out[299]: 
array([(10.0, 13.5), (20.0, 0.0), (30.0, 0.0), (40.0, 0.0), (50.0, 0.0)], 
      dtype={'names':['x','y'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':32})

Verify that b is a view (not a copy) of a by changing b[0]['x']...

In [300]: b[0]['x'] = 3.14

and seeing that a is also changed:

In [301]: a[0]
Out[301]: (3.14, 13.5, 1248, -2)
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Unfortunately, this doesn't work for the object dtype: `TypeError: Cannot change data-type for object array.` – mapf Oct 13 '21 at 12:17
  • The post v1.16 multifield indexing does the same as `view_fields`. – hpaulj Oct 13 '21 at 22:51
  • @mapft, it's not a good idea to post a new error in a comment. Errors are best addressed with context ([mcve]) and traceback. – hpaulj Oct 13 '21 at 22:52