232

Sometimes it is useful to "clone" a row or column vector to a matrix. By cloning I mean converting a row vector such as

[1, 2, 3]

Into a matrix

[[1, 2, 3],
 [1, 2, 3],
 [1, 2, 3]]

or a column vector such as

[[1],
 [2],
 [3]]

into

[[1, 1, 1]
 [2, 2, 2]
 [3, 3, 3]]

In MATLAB or octave this is done pretty easily:

 x = [1, 2, 3]
 a = ones(3, 1) * x
 a =

    1   2   3
    1   2   3
    1   2   3
    
 b = (x') * ones(1, 3)
 b =

    1   1   1
    2   2   2
    3   3   3

I want to repeat this in numpy, but unsuccessfully

In [14]: x = array([1, 2, 3])
In [14]: ones((3, 1)) * x
Out[14]:
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])
# so far so good
In [16]: x.transpose() * ones((1, 3))
Out[16]: array([[ 1.,  2.,  3.]])
# DAMN
# I end up with 
In [17]: (ones((3, 1)) * x).transpose()
Out[17]:
array([[ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

Why wasn't the first method (In [16]) working? Is there a way to achieve this task in python in a more elegant way?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170
  • 7
    In Matlab, note that it is much faster to use `repmat`: `repmat([1 2 3],3,1)` or `repmat([1 2 3].',1,3)` – Luis Mendo Oct 08 '13 at 14:07
  • Octave also has `repmat`. – ma11hew28 Mar 31 '14 at 02:29
  • For those looking to do similar with a pandas dataframe checkout the `tile_df` [linked here](http://stackoverflow.com/questions/13166842/pandas-dataframe-multiply-with-a-series) – zelusp Jan 20 '16 at 20:13

12 Answers12

421

Use numpy.tile:

>>> tile(array([1,2,3]), (3, 1))
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

or for repeating columns:

>>> tile(array([[1,2,3]]).transpose(), (1, 3))
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])
Løiten
  • 3,185
  • 4
  • 24
  • 36
pv.
  • 33,875
  • 8
  • 55
  • 49
  • 28
    Upvote! On my system, for a vector with 10000 elements repeated 1000 times, the `tile` method is 19.5 times faster than the method in the currently accepted answer (using the multiplication-operator-method). – Dr. Jan-Philip Gehrcke Jun 27 '12 at 14:26
  • 1
    In the second section ("repeating columns"), can you explain what the second set of square bracket does, i.e. [[1,2,3]] – Ant Jan 09 '17 at 00:51
  • @Ant it makes into a 2D array with length 1 in the first axis (vertical on your screen) and length 3 in the second axis (horizontal on your screen). Transposing then makes it have length 3 in the first axis and length 1 in the second axis. A tile shape of `(1, 3)` copies this column over three times, which is why the rows of the result contain a single distinct element each. – BallpointBen Apr 13 '18 at 21:11
  • This should be the accepted answer since you can pass any vector already initialized while the accepted one can only work if you add the comma while you initialize the vector. Thanks ! – Yohan Obadia Feb 05 '19 at 14:30
  • I can't get this to work for a 2d to 3d solution :( – john k Sep 09 '19 at 23:54
  • 1
    This solution works for rows but not columns. – xFioraMstr18 Nov 02 '22 at 17:12
109

Here's an elegant, Pythonic way to do it:

>>> array([[1,2,3],]*3)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

>>> array([[1,2,3],]*3).transpose()
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

the problem with [16] seems to be that the transpose has no effect for an array. you're probably wanting a matrix instead:

>>> x = array([1,2,3])
>>> x
array([1, 2, 3])
>>> x.transpose()
array([1, 2, 3])
>>> matrix([1,2,3])
matrix([[1, 2, 3]])
>>> matrix([1,2,3]).transpose()
matrix([[1],
        [2],
        [3]])
Peter
  • 127,331
  • 53
  • 180
  • 211
  • 1
    (transpose works for 2D arrays, e.g. for the square one in the example, or when turning into a `(N,1)`-shape array using `.reshape(-1, 1)`) – Mark Mar 21 '14 at 13:28
  • 44
    This is highly inefficient. Use `numpy.tile` as shown in [pv.'s answer](http://stackoverflow.com/a/1582742/505088). – David Heffernan Jan 14 '16 at 17:08
  • As Peter says, it's Pythonic, which favours *readability* over efficiency. If speed is so critical I'd question why we are using Python in the first place. – c z Feb 05 '21 at 10:26
  • 1
    @cz that's why we are using numpy. because most of the time is spent on the matrix calculations which are done in LAPACK. – qwr Mar 30 '22 at 02:42
61

First note that with numpy's broadcasting operations it's usually not necessary to duplicate rows and columns. See this and this for descriptions.

But to do this, repeat and newaxis are probably the best way

In [12]: x = array([1,2,3])

In [13]: repeat(x[:,newaxis], 3, 1)
Out[13]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [14]: repeat(x[newaxis,:], 3, 0)
Out[14]: 
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

This example is for a row vector, but applying this to a column vector is hopefully obvious. repeat seems to spell this well, but you can also do it via multiplication as in your example

In [15]: x = array([[1, 2, 3]])  # note the double brackets

In [16]: (ones((3,1))*x).transpose()
Out[16]: 
array([[ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])
Femkemilene
  • 183
  • 2
  • 15
tom10
  • 67,082
  • 10
  • 127
  • 137
  • 7
    newaxis has the additional benefit that it doesn't actually copy the data until it needs to. So if you are doing this to multiply or add to another 3x3 array, the repeat is unnecessary. Read up on numpy broadcasting to get the idea. – AFoglia Oct 12 '09 at 15:20
  • @AFoglia - Good point. I updated my answer to point this out. – tom10 Oct 12 '09 at 16:22
  • 2
    What benefits of using `np.repeat` vs `np.tile`? – mrgloom Dec 05 '18 at 17:53
  • 2
    @mrgloom: None, mostly, for this case. For a small 1D array, they're similar and there's not a significant difference/benefit/advantage/etc. Personally, I find the symmetry between the row and column cloning to be more intuitive, and I don't like the transpose needed for tile, but it's just a matter of taste. Mateen Ulhaq's answer also says repeat is faster, but this may depend on the exact use case that's considered, although repeat is much closer to the C-functionality, so will likely remain somewhat faster. In 2D they have different behaviors so it matters there. – tom10 Dec 07 '18 at 05:50
18

Let:

>>> n = 1000
>>> x = np.arange(n)
>>> reps = 10000

Zero-cost allocations

A view does not take any additional memory. Thus, these declarations are instantaneous:

# New axis
x[np.newaxis, ...]

# Broadcast to specific shape
np.broadcast_to(x, (reps, n))

Forced allocation

If you want force the contents to reside in memory:

>>> %timeit np.array(np.broadcast_to(x, (reps, n)))
10.2 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit np.repeat(x[np.newaxis, :], reps, axis=0)
9.88 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit np.tile(x, (reps, 1))
9.97 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

All three methods are roughly the same speed.

Computation

>>> a = np.arange(reps * n).reshape(reps, n)
>>> x_tiled = np.tile(x, (reps, 1))

>>> %timeit np.broadcast_to(x, (reps, n)) * a
17.1 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit x[np.newaxis, :] * a
17.5 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit x_tiled * a
17.6 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

All three methods are roughly the same speed.


Conclusion

If you want to replicate before a computation, consider using one of the "zero-cost allocation" methods. You won't suffer the performance penalty of "forced allocation".

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
9

I think using the broadcast in numpy is the best, and faster

I did a compare as following

import numpy as np
b = np.random.randn(1000)
In [105]: %timeit c = np.tile(b[:, newaxis], (1,100))
1000 loops, best of 3: 354 µs per loop

In [106]: %timeit c = np.repeat(b[:, newaxis], 100, axis=1)
1000 loops, best of 3: 347 µs per loop

In [107]: %timeit c = np.array([b,]*100).transpose()
100 loops, best of 3: 5.56 ms per loop

about 15 times faster using broadcast

smartkevin
  • 133
  • 1
  • 5
5

One clean solution is to use NumPy's outer-product function with a vector of ones:

np.outer(np.ones(n), x)

gives n repeating rows. Switch the argument order to get repeating columns. To get an equal number of rows and columns you might do

np.outer(np.ones_like(x), x)
Jon Deaton
  • 3,943
  • 6
  • 28
  • 41
4

You can use

np.tile(x,3).reshape((4,3))

tile will generate the reps of the vector

and reshape will give it the shape you want

thebeancounter
  • 4,261
  • 8
  • 61
  • 109
4

Returning to the original question

In MATLAB or octave this is done pretty easily:

x = [1, 2, 3]

a = ones(3, 1) * x ...

In numpy it's pretty much the same (and easy to memorize too):

x = [1, 2, 3]
a = np.tile(x, (3, 1))

output

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])
Alex Fedotov
  • 497
  • 3
  • 8
1

If you have a pandas dataframe and want to preserve the dtypes, even the categoricals, this is a fast way to do it:

import numpy as np
import pandas as pd
df = pd.DataFrame({1: [1, 2, 3], 2: [4, 5, 6]})
number_repeats = 50
new_df = df.reindex(np.tile(df.index, number_repeats))
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
1

Another solution

>> x = np.array([1,2,3])
>> y = x[None, :] * np.ones((3,))[:, None]
>> y
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

Why? Sure, repeat and tile are the correct way to do this. But None indexing is a powerful tool that has many times let me quickly vectorize an operation (though it can quickly be very memory expensive!).

An example from my own code:

# trajectory is a sequence of xy coordinates [n_points, 2]
# xy_obstacles is a list of obstacles' xy coordinates [n_obstacles, 2]
# to compute dx, dy distance between every obstacle and every pose in the trajectory
deltas = trajectory[:, None, :2] - xy_obstacles[None, :, :2]
# we can easily convert x-y distance to a norm
distances = np.linalg.norm(deltas, axis=-1)
# distances is now [timesteps, obstacles]. Now we can for example find the closest obstacle at every point in the trajectory by doing
closest_obstacles = np.argmin(distances, axis=1)
# we could also find how safe the trajectory is, by finding the smallest distance over the entire trajectory
danger = np.min(distances)
Dugas
  • 381
  • 1
  • 3
  • 9
0

To answer the actual question, now that nearly a dozen approaches to working around a solution have been posted: x.transpose reverses the shape of x. One of the interesting side-effects is that if x.ndim == 1, the transpose does nothing.

This is especially confusing for people coming from MATLAB, where all arrays implicitly have at least two dimensions. The correct way to transpose a 1D numpy array is not x.transpose() or x.T, but rather

x[:, None]

or

x.reshape(-1, 1)

From here, you can multiply by a matrix of ones, or use any of the other suggested approaches, as long as you respect the (subtle) differences between MATLAB and numpy.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
-1
import numpy as np
x=np.array([1,2,3])
y=np.multiply(np.ones((len(x),len(x))),x).T
print(y)

yields:

[[ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]]
kibitzforu
  • 383
  • 4
  • 10