86

In Numpy, I can concatenate two arrays end-to-end with np.append or np.concatenate:

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> Z = np.append(X, Y, axis=0)
>>> Z
array([[ 1,  2,  3],
       [-1, -2, -3],
       [ 4,  5,  6]])

But these make copies of their input arrays:

>>> Z[0,:] = 0
>>> Z
array([[ 0,  0,  0],
       [-1, -2, -3],
       [ 4,  5,  6]])
>>> X
array([[1, 2, 3]])

Is there a way to concatenate two arrays into a view, i.e. without copying? Would that require an np.ndarray subclass?

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Why do you want to have a view rather then a copy? – Winston Ewert Oct 23 '11 at 21:03
  • 3
    @WinstonEwert: I have a long list of arrays on which I want to perform a single, global normalization. – Fred Foo Oct 23 '11 at 21:09
  • list comprehension will be fast, too. – cyborg Oct 23 '11 at 21:30
  • That doesn't answer the question, what's wrong with copying all those arrays? Basically, are you concerned about the cost of copying, or do you want to modify the original arrays? – Winston Ewert Oct 23 '11 at 22:21
  • 2
    @WinstonEwert: the cost of copying is the problem; otherwise I could just `concatenate` them and replace the original arrays with views into the concatenation. Looks like that's what I'll have to do, though. – Fred Foo Oct 23 '11 at 22:32
  • 1
    there may be a better way, but without a better overall picture of what you are doing, I can't say. As it is, it may be faster to normalize the arrays individually rather then concat/split. – Winston Ewert Oct 23 '11 at 22:34
  • @WinstonEwert: that's impossible. I think I'll adapt my input routines to allocate everything in one go and then split my huge array into a list of views later on. That seems like the simplest option. – Fred Foo Oct 23 '11 at 22:43

6 Answers6

94

The memory belonging to a Numpy array must be contiguous. If you allocated the arrays separately, they are randomly scattered in memory, and there is no way to represent them as a view Numpy array.

If you know beforehand how many arrays you need, you can instead start with one big array that you allocate beforehand, and have each of the small arrays be a view to the big array (e.g. obtained by slicing).

pv.
  • 33,875
  • 8
  • 55
  • 49
  • 14
    Inconsequential remark: the memory of a view doesn't have to be contiguous but it probably has to be ordered in fixed strides (which is also not the case with a list of arrays). – cyborg Oct 23 '11 at 22:11
  • Are you saying that even a subclass won't work? I know people use `ndarray` subclasses to work with `mmap`'d arrays, but I guess memory mappings are also contiguous... – Fred Foo Oct 23 '11 at 22:12
  • 4
    Yep, subclasses must also adhere to Numpy's memory model. (@cyborgs's comment above is also correct: the sub-arrays could also be ordered in memory with fixed strides, but also that can be obtained only by arranging things beforehand.) Careful reading of [this page](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray) may shed some more light. – pv. Oct 25 '11 at 16:21
16

Just initialize the array before you fill it with data. If you want you can allocate more space than needed and it will not take up more RAM because of the way numpy works.

A = np.zeros(R,C)
A[row] = [data]

The memory is used only once data is put into the array. Creating a new array from concatenating two will never finish on a dataset of any size, i.e. dataset > 1GB or so.

John
  • 633
  • 6
  • 10
1

I had the same problem and ended up doing it reversed, after concatenating normally (with copy) I reassigned the original arrays to become views on the concatenated one:

import numpy as np

def concat_no_copy(arrays):
    """ Concats the arrays and returns the concatenated array 
    in addition to the original arrays as views of the concatenated one.

    Parameters:
    -----------
    arrays: list
        the list of arrays to concatenate
    """
    con = np.concatenate(arrays)

    viewarrays = []
    for i, arr in enumerate(arrays):
        arrnew = con[sum(len(a) for a in arrays[:i]):
                     sum(len(a) for a in arrays[:i + 1])]
        viewarrays.append(arrnew)
        assert all(arr == arrnew)

    # return the view arrays, replace the old ones with these
    return con, viewarrays

You can test it as follows:

def test_concat_no_copy():
    arr1 = np.array([0, 1, 2, 3, 4])
    arr2 = np.array([5, 6, 7, 8, 9])
    arr3 = np.array([10, 11, 12, 13, 14])

    arraylist = [arr1, arr2, arr3]

    con, newarraylist = concat_no_copy(arraylist)

    assert all(con == np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
                                11, 12, 13, 14]))

    for old, new in zip(arraylist, newarraylist):
        assert all(old == new)
architectonic
  • 2,871
  • 2
  • 21
  • 35
1

Not really elegant at all but you can get close to what you want using a tuple to store pointers to the arrays. Now I have no idea how I would use it in the case but I have done things like this before.

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> z = (X, Y)
>>> z[0][:] = 0
>>> z
(array([[0, 0, 0]]), array([[-1, -2, -3],
       [ 4,  5,  6]]))
>>> X
array([[0, 0, 0]])
Brian Larsen
  • 1,740
  • 16
  • 28
0

You may create an array of arrays, like:

>>> from numpy import *
>>> a = array([1.0, 2.0, 3.0])
>>> b = array([4.0, 5.0])
>>> c = array([a, b])
>>> c
array([[ 1.  2.  3.], [ 4.  5.]], dtype=object)
>>> a[0] = 100.0
>>> a
array([ 100.,    2.,    3.])
>>> c
array([[ 100.    2.    3.], [ 4.  5.]], dtype=object)
>>> c[0][1] = 200.0
>>> a
array([ 100.,  200.,    3.])
>>> c
array([[ 100.  200.    3.], [ 4.  5.]], dtype=object)
>>> c *= 1000
>>> c
array([[ 100000.  200000.    3000.], [ 4000.  5000.]], dtype=object)
>>> a
array([ 100.,  200.,    3.])
>>> # Oops! Copies were made...

The problem is that it creates copies on broadcast operations (sounds like a bug).

e.tadeu
  • 5,024
  • 2
  • 20
  • 21
0

The answer is based on my other answer in Reference to ndarray rows in ndarray

X = np.array([[1,2,3]])
Y = np.array([[-1,-2,-3],[4,5,6]])
Z = np.array([None, None, None])
Z[0] = X[0]
Z[1] = Y[0]
Z[2] = Y[1]

Z[0][0] = 5 # X would be changed as well

print(X)
Output: 
array([[5, 2, 3]])

# Let's make it a function!
def concat(X, Y, copy=True):
    """Return an array of references if copy=False""" 
    if copy is True:  # deep copy
        return np.append(X, Y, axis=0)
    len_x, len_y = len(X), len(Y)
    ret = np.array([None for _ in range(len_x + len_y)])
    for i in range(len_x):
        ret[i] = X[i]
    for j in range(len_y):
        ret[len_x + j] = Y[j] 
    return ret
Tai
  • 7,684
  • 3
  • 29
  • 49