1

To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields duplicate memory.

Is there a way to add a field to a structured ndarray, without duplicating memory? That means, either a way that avoids creating a new ndarray, or a way to create a new ndarray that points to the same data as the old?

Solutions that do duplicate RAM:

There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.

Community
  • 1
  • 1
gerrit
  • 24,025
  • 17
  • 97
  • 170
  • If your array is a view on a buffer of which the last half is not used, you might be able to allocate the extra fields in the last half (rather than adjacent to their existing row). – Eric Oct 11 '16 at 00:09

1 Answers1

3

A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.

Adding a field, say I4 dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy can't do that without making a new data buffer and copying values from the old (and the new).

Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.

Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.

Maybe you should be using a dictionary, with item being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Hmm, ok. I need a different approach then. Perhaps I could cut the big array in N pieces, adding fields to the smaller pieces one at a time, so that I still copy everything, but not all at once, thus limiting peak memory usage. – gerrit Oct 10 '16 at 23:01
  • Still, it should be possible to do it in a memory friendly way without duplicating data. It could copy data to a new array and grow the new array size while simultaneously decreasing the old array size. – Bastiaan Feb 22 '17 at 19:18
  • Each 'grow' and 'decrease' requires a data copy. Don't worry about 'memory friendly' unless it is really hurting execution times or you get 'memory error' problems. But don't conflate those two problems. – hpaulj Feb 22 '17 at 21:50