The method below fills one diagonal at a time:
import numpy as np
x = np.zeros((4, 6), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
array([[6, 7, 8, 0, 0, 0],
[0, 6, 7, 8, 0, 0],
[0, 0, 6, 7, 8, 0],
[0, 0, 0, 6, 7, 8]])
or you could do the one liner:
x = [6,7,8,0,0,0]
y = np.vstack([np.roll(x,i) for i in range(4)])
Personally, I prefer the first since it's easier to understand and probably faster since it doesn't build all the temporary 1D arrays.
Edit:
Since a discussion of efficiency has come up, it might be worthwhile to run a test. I also included time to the toeplitz
method suggested by chthonicdaemon (although personally I interpreted the question to exclude this approach since it uses a package rather than using original code -- also though speed isn't the point of the original question either).
import numpy as np
import timeit
import scipy.linalg as sl
def a(m, n):
x = np.zeros((m, m), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
def b(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.vstack([np.roll(x,i) for i in range(m)])
def c(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.zeros((m,))
y[0] = vals[0]
r = sl.toeplitz(y, x)
return r
m, n = 4, 6
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=1000)
m, n = 1000, 1006
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=100)
# which gives:
0.03525209 # fill_diagonal
0.07554483 # vstack
0.07058787 # toeplitz
0.18803215 # fill_diagonal
2.58780789 # vstack
1.57608604 # toeplitz
So the first method is about a 2-3x faster for small arrays and 10-20x faster for larger arrays.