3

In some cases the original numpy array is desired to be overwritten by the concatenated numpy array. I want to discuss an exemplar case of a numpy array inside a complex structured array. This question arose by answering a question about structured arrays https://stackoverflow.com/a/27563022/2062965 .

import numpy as np
x = np.zeros(1, dtype = [('Table', float64, (2, 2)),
                         ('Number', float),
                         ('String', '|S10')])

# Append values to the array
print(np.append(x['Table'], array([[[1], [2]]]), axis=2))

# This assignment will lead to the error message mentioned below:
x['Table'] = np.append(x['Table'], array([[[1], [2]]]), axis=2)

Similar questions

There are several ways around, like numpy.append, numpy.concatenate, numpy.vstack or numpy.hstack.

Each of them creates a new array, which can not be assigned back to the old variable by returning an error message like the following one:

ValueError: could not broadcast input array from shape (1,2,3) into shape (1,2,2)

Possible approach

I could, as a straight but time-consuming solution, define a new empty numpy array which I fill with the old data and the ones that should be append.

Thank you also for other solutions.

Community
  • 1
  • 1
strpeter
  • 2,562
  • 3
  • 27
  • 48
  • If I understand numpy right, it puts its data into continuous memory blocks. So I'm not sure that numpy is well suited for your task. You might want to look at pytables (http://www.pytables.org), which is designed for dealing with hierarchical data. – Dietrich Dec 23 '14 at 13:33
  • @Dietrich: I don't know `numpy` that well... Even though that it is not answering the question directly, could you please go a little bit more into details? Just to keep in mind: I do not want to write this array into a file (like h5 or txt). It is just for internal manipulations. – strpeter Dec 23 '14 at 13:42
  • Appending is efficient if your internal data structure is something like a linked list (I'm not sure, but I think python's lists are implemented that way). If you enlarge a numpy array, it will have to copy your data to a new memory location, since it cannot expect to find free memory behind its current block. So I think that there's not really a more efficient way than the one you suggested - unless you use a different data structure: Depending on the data size, a python list or dict may be sufficient. For large arrays (significant portion of your RAM), I would suggest pytables or a database. – Dietrich Dec 23 '14 at 14:12
  • @Dietrich: If I understand you correctly, it is more efficient to use lists of lists when extending them. You also suggest pytables or a database but only if it might exceed my RAM? I do not want to discuss this case here. – strpeter Dec 23 '14 at 14:44
  • Exactly - my standard strategy for successively building up arrays is to do it with a list and when done convert the list to a numpy array. In my experience the speed loses to preallocating a large enough chunk and moving the raw data around (see `numpy.getbuffer()`) are in most cases negligible. – Dietrich Dec 23 '14 at 16:48

1 Answers1

2

A numpy array keeps its data in a fixed size buffer. Attributes like shape, strides and dtype are used to interpret that data. Those attributes can be changed, and values within the data buffer can be changed. But anything that changes the size of the buffer requires a copy.

append, concatenate, etc all create a new array, and fill it with data from the original arrays.

Your append action creates a new (1,2,3) array. It cannot replace the (1,2,2) string of bytes in the x buffer.

If ('Table', float64, (2, 2)) was replaced by ('Table', object), then x['Table'] could be changed. That's because x now contains a pointer to a separate array. The assignment replaces one pointer with another, without changing the size of the x buffer. It's like changing the value of a dictionary, or replacing a nest list within a list.

Why are you trying to use a structured array rather than conventional Python structures like list, dict or a custom class object?

Here's a sequence that works:

In [116]: x = np.zeros(1, dtype = [('Table', 'O'),
                         ('Number', np.float),
                         ('String', '|S10')])

In [117]: x['Table'][0] = np.zeros((2,2),dtype=np.float64)

In [118]: x['Table'][0] = np.append(x['Table'][0], np.array([[[1], [2]]]))

In [119]: x
Out[119]: 
array([([0.0, 0.0, 0.0, 0.0, 1.0, 2.0], 0.0, '')], 
      dtype=[('Table', 'O'), ('Number', '<f8'), ('String', 'S10')])

But notice that I have to assign the new arrays to x['Table'][0] - a 'row' within in the 'Table' field.

In [120]: x['Table']
Out[120]: array([array([ 0.,  0.,  0.,  0.,  1.,  2.])], dtype=object)

x['Table'] is another structured array.

Looking back at your original x definition, let's give it 3 'rows' (elements):

In [132]: x = np.zeros(3, dtype = [('Table', np.float64, (2, 2)),
                         ('Number', np.float),
                         ('String', '|S10')])

In [133]: x
Out[133]: 
array([([[0.0, 0.0], [0.0, 0.0]], 0.0, ''),
       ([[0.0, 0.0], [0.0, 0.0]], 0.0, ''),
       ([[0.0, 0.0], [0.0, 0.0]], 0.0, '')], 
      dtype=[('Table', '<f8', (2, 2)), ('Number', '<f8'), ('String', 'S10')])

In [134]: x['Table'].shape
Out[134]: (3, 2, 2)

The data buffer for x is a sequence of float 0s, interspersed with 10 blanks. When I ask for x['Table'] it gives me a non contiguous view of 12 of those 0s, with a (3,2,2) shape.

I can change elements of that array:

In [137]: x['Table'][0,0,:]=[1,1]

But I can't expand it in anyway - not without making a new x array.


Another structure like construct is a dictionary:

In [156]: x={'Table': np.zeros((1,2,2),dtype=np.float64),
             'Number':np.zeros((1,)), 
             'String':['']}

In [157]: x
Out[157]: 
{'Number': array([ 0.]),
 'String': [''],
 'Table': array([[[ 0.,  0.],
        [ 0.,  0.]]])}

In [158]: x['Table'] =np.append(x['Table'],[1,2])

In [159]: x
Out[159]: 
{'Number': array([ 0.]),
 'String': [''],
 'Table': array([ 0.,  0.,  0.,  0.,  1.,  2.])}

Complex data structures like this make most sense when read from a CSV file. For example

In [161]: dt = np.dtype([('Table', np.float64, (2, 2)),
                         ('Number', np.float),
                         ('String', '|S10')])

In [162]: txt="""0 0 0 0 0 astring
   .....: 1 2 3 4 0 another
   .....: 1 1 1 1 10 end
   .....: """

In [163]: A=np.genfromtxt(txt.splitlines(),dtype=dt)

In [164]: A
Out[164]: 
array([([[0.0, 0.0], [0.0, 0.0]], 0.0, 'astring'),
       ([[1.0, 2.0], [3.0, 4.0]], 0.0, 'another'),
       ([[1.0, 1.0], [1.0, 1.0]], 10.0, 'end')], 
      dtype=[('Table', '<f8', (2, 2)), ('Number', '<f8'), ('String', 'S10')])

genfromtxt reads the lines, parses them into a list of lists, and only at end does it pack them into the structured array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353