I have two np.ndarray
s
a
is an array of shape(13000, 8, 315000)
and typeuint8
b
is an array of shape(8,)
and typefloat32
I want to multiply each slice along the second dimension (8) by the corresponding element in b
and sum along that dimension (i.e. a dot product along the second axis). The result will be of shape (13000, 315000)
I have devised two ways of doing this:
np.einsum('ijk,j->ik', a, b)
: using%timeit
it gives49 s ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.dot(a.transpose(0, 2, 1), b)
: using%timeit
it gives1min 8s ± 3.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Are there faster alternatives?
Complementary information
np.show_config()
returns:
blas_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blis_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
a.flags
:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
b.flags
:
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False