2

Let’s say I have two NumPy arrays, a and b:

a = np.array([
    [1, 2, 3],
    [2, 3, 4]
    ])

b = np.array([8,9])

And I would like to append the same array b to every row (ie. adding multiple columns) to get an array, c:

b = np.array([
    [1, 2, 3, 8, 9],
    [2, 3, 4, 8, 9]
    ])

How can I do this easily and efficiently in NumPy?

I am especially concerned about its behaviour with big datasets (where a is much bigger than b), is there any way around creating many copies (ie. a.shape[0]) of b?

Related to this question, but with multiple values.

MonsieurWave
  • 199
  • 3
  • 10

3 Answers3

3

Here's one way. I assume it's efficient because it's vectorised. It relies on the fact that in matrix multiplication, pre-multiplying a row by the column (1, 1) will produce two stacked copies of the row.

import numpy as np

a = np.array([
    [1, 2, 3],
    [2, 3, 4]
    ])

b = np.array([[8,9]])

np.concatenate([a, np.array([[1],[1]]).dot(b)], axis=1)

Out: array([[1, 2, 3, 8, 9],
            [2, 3, 4, 8, 9]])

Note that b is specified slightly differently (as a two-dimensional array).

Is there any way around creating many copies of b?

The final result contains those copies (and numpy arrays are literally arrays of values in memory), so I don't see how.

Denziloe
  • 7,473
  • 3
  • 24
  • 34
  • Although this doesn't sound very pythonic, one could imagine the values of 'b' to be stored on the disk and 'c' simply containing the pointer to an array of pointers pointing to the values of 'b'. – MonsieurWave Sep 01 '18 at 21:48
  • 1
    Yes -- operating with such objects is slower though, which is why `numpy` to my knowledge uses simple contiguous arrays of values. – Denziloe Sep 01 '18 at 21:52
  • Is it possible to generalize your answer, for example if the dimensions of 'a' are '[3,3]' or '[n,m]' for a given 'n' and 'm'? – MonsieurWave Sep 01 '18 at 21:56
  • 1
    Sure, it just requires that the vectors of ones is of the same height as `a`, i.e. `np.ones([1, a.shape[0]], dtype=int)`. – Denziloe Sep 01 '18 at 21:59
  • I see, but wouldn't it be `np.ones([a.shape[0],1], dtype=int)` in that case? – MonsieurWave Sep 01 '18 at 22:10
  • Good spot, yes. – Denziloe Sep 01 '18 at 22:12
2

The way I solved this initially was :

c = np.concatenate([a, np.tile(b, (a.shape[0],1))], axis = 1)

But this feels very inefficient...

MonsieurWave
  • 199
  • 3
  • 10
  • There's no Python level loop! It should be `a.shape[0]` shouldn't it? – hpaulj Sep 01 '18 at 21:47
  • A quick set of timings indicates that `repeat` is the fastest way of creating a (n,2) array: `b.repeat(n).reshape(2,-1).T`. `repeat` is a built-in. – hpaulj Sep 01 '18 at 21:54
2

An alternative to concatenate approach is to make a recipient array, and copy values to it:

In [483]: a = np.arange(300).reshape(100,3)
In [484]: b=np.array([8,9])
In [485]: res = np.zeros((100,5),int)
In [486]: res[:,:3]=a
In [487]: res[:,3:]=b

sample timings

In [488]: %%timeit
     ...: res = np.zeros((100,5),int)
     ...: res[:,:3]=a
     ...: res[:,3:]=b
     ...: 
     ...: 
6.11 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [491]: timeit np.concatenate((a, b.repeat(100).reshape(2,-1).T),1)
7.74 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [164]: timeit np.concatenate([a, np.ones([a.shape[0],1], dtype=int).dot(np.array([b]))], axis=1) 
8.58 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
MonsieurWave
  • 199
  • 3
  • 10
hpaulj
  • 221,503
  • 14
  • 230
  • 353