Vectorizing Pairwise Column Element-wise Product in NumPy

Question

I have two DataFrames:

>>> d1

    A  B
0   4  3
1   5  2
2   4  3

>>> d2

    C  D  E
0   1  4  7
1   2  5  8
2   3  6  9

>>> what_I_want

    AC  AD  AE  BC  BD  BE
0   4   16  28  3   12  21
1   10  25  40  4   10  16
2   12  24  36  9   18  27

Two DataFrames have the same number of rows (say m), but different number of columns (say ncol_1, ncol_2). The output is a m by (ncol_1 * ncol_2) DataFrame. Each column is the product of the one column in d1 and one column in d2.

I have come across np.kron but it does not do quite what I want. My actual data has millions of rows.

I am wondering if there is any vectorized way of doing this? I currently have a itertools.product implementation but the speed is excruciatingly slow.

Divakar · Accepted Answer · 2019-11-06T18:36:55.700

8

One with NumPy-broadcasting -

a = d1.to_numpy(copy=False) # d1.values on older pandas versions
b = d2.to_numpy(copy=False)
df_out = pd.DataFrame((a[:,:,None]*b[:,None,:]).reshape(len(a),-1))
df_out.columns = [i+j for i in d1.columns for j in d2.columns]

For large data, leverage multi-cores with numexpr -

import numexpr as ne

out = ne.evaluate('a3D*b3D',{'a3D':a[:,:,None],'b3D':b[:,None]}).reshape(len(a),-1)
df_out = pd.DataFrame(out)

edited Nov 06 '19 at 18:36

answered Nov 06 '19 at 18:30

Divakar

218,885
19
262
358

1

Thank you for your answer! It works like a charm. But I have a question: is there a reason in NumPy version, you are using ```b[:,None,:]``` but in numexpr you are using ```b[:,None]```? – Kemeng Zhang Nov 06 '19 at 19:18
@KemengZhang Nah, it's the same. In the numexpr version, I was trying to make it compact to keep it within the answer post allowed width. – Divakar Nov 06 '19 at 19:28

score 4 · Answer 2 · edited Nov 06 '19 at 18:56

4

IIUC, using for loop is not always bad, check

pd.DataFrame({x+y: df1[x]*df2[y]  for x in df1 for y in df2})
Out[81]: 
   AC  AD  AE  BC  BD  BE
0   4  16  28   3  12  21
1  10  25  40   4  10  16
2  12  24  36   9  18  27

edited Nov 06 '19 at 18:56

Umar.H

22,559
7
39
74

answered Nov 06 '19 at 18:26

BENY

317,841
20
164
234

Vectorizing Pairwise Column Element-wise Product in NumPy

2 Answers2