I was trying to implement a kronecker product function. Below are three ideas that I have:
def kron(arr1, arr2):
"""columnwise outer product, avoiding relocate elements.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
res = np.empty((nrows, ncols))
for idx1 in range(c1):
for idx2 in range(c2):
new_c = idx1 * c2 + idx2
temp = np.zeros((r2, r1))
temp_kron = scipy.linalg.blas.dger(
alpha=1.0, x=arr2[:, idx2], y=arr1[:, idx1], incx=1, incy=1,
a=temp)
res[:, new_c] = np.ravel(temp_kron, order='F')
return res
def kron2(arr1, arr2):
"""First outer product, then rearrange items.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
tmp = np.outer(arr2, arr1)
res = np.empty((nrows, ncols))
for idx in range(arr1.size):
for offset in range(c2):
orig = tmp[offset::c2, idx]
dest_coffset = idx % c1 * c2 + offset
dest_roffset = (idx // c1) * r2
res[dest_roffset:dest_roffset+r2, dest_coffset] = orig
return res
def kron3(arr1, arr2):
"""First outer product, then rearrange items.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
tmp = np.outer(np.ravel(arr2, 'F'), np.ravel(arr1, 'F'))
res = np.empty((nrows, ncols))
for idx in range(arr1.size):
for offset in range(c2):
orig_offset = offset * r2
orig = tmp[orig_offset:orig_offset+r2, idx]
dest_c = idx // r1 * c2 + offset
dest_r = idx % r1 * r2
res[dest_r:dest_r+r2, dest_c] = orig
return res
Based on this stackoverflow post I created a MeasureTime decorator. A natural benchmark would be to compare against numpy.kron. Below are my test functions:
@MeasureTime
def test_np_kron(arr1, arr2, number=1000):
for _ in range(number):
np.kron(arr1, arr2)
return
@MeasureTime
def test_kron(arr1, arr2, number=1000):
for _ in range(number):
kron(arr1, arr2)
@MeasureTime
def test_kron2(arr1, arr2, number=1000):
for _ in range(number):
kron2(arr2, arr1)
@MeasureTime
def test_kron3(arr1, arr2, number=1000):
for _ in range(number):
kron3(arr2, arr1)
Turned out that Numpy's kron function performances much better:
arr1 = np.array([[1,-4,7], [-2, 3, 3]], dtype=np.float64, order='F')
arr2 = np.array([[8, -9, -6, 5], [1, -3, -4, 7], [2, 8, -8, -3], [1, 2, -5, -1]], dtype=np.float64, order='F')
In [243]: test_np_kron(arr1, arr2, number=10000)
Out [243]: "test_np_kron": 0.19688990000577178s
In [244]: test_kron(arr1, arr2, number=10000)
Out [244]: "test_kron": 0.6094115000014426s
In [245]: test_kron2(arr1, arr2, number=10000)
Out [245]: "test_kron2": 0.5699560000066413s
In [246]: test_kron3(arr1, arr2, number=10000)
Out [246]: "test_kron3": 0.7134822000080021s
I would like to know why that is the case? Is that because Numpy's reshape method is much more performant than manually copying over stuff (although still using numpy)? I was puzzled, since otherwise, I was using np.outer / blas.dger as well. The only difference I recognized here was how we arranged the ending results. How come NumPy's reshape perform this good?
Here is the link to NumPy 1.17 kron source.
Updates: Forgot to mention in the first place that I was trying to prototype in python, and then implement kron using C++ with cblas/lapack. Had some existing 'kron' needing to be refactored. I then came across Numpy's reshape and got really impressed.
Thanks in advance for your time!