Efficient product of columns in Pandas

Question

I am wondering whether it is possible via a Pandas function to achieve the following. Given two Pandas DataFrames, get a new DataFrame whose columns are the Cartesian product of the columns in the two given DataFrames. That is, in a simple example, if we have the two DataFrames:

df1 = pd.DataFrame([[1,2], [1,2]], columns = ['a', 'b'])
df2 = pd.DataFrame([[3,4], [3,4]], columns = ['c', 'd'])

which look like

df1                 df2
   a  b                c  d
0  1  2             0  3  4
1  1  2             1  3  4

I am looking for a function that provides , without looping, the following:

df
   a_c  a_d  b_c  b_d
0  3    4    6    8
1  3    4    6    8

wouldn't the cartesian product be something else, like (1,3) (1,4) (2,3) (2,4) \\ (1,3) (1,4) (2,3) (2,4) ? See https://stackoverflow.com/a/35268188/4248972 for an answer — pasbi, Sep 17 '17 at 14:17
i changed a tag to `numpy` in order to attract Numpy experts... — MaxU - stand with Ukraine, Sep 17 '17 at 14:20

Zero · Answer 1 · 2017-09-17T14:41:13.917

3

You could do use pd.concat with add_prefix and mul df2 with df1's series(s).

In [806]: pd.concat([df2.mul(df1[c], axis=0).add_prefix(c+'_') for c in df1], axis=1)
Out[806]:
   a_c  a_d  b_c  b_d
0    3    4    6    8
1    3    4    6    8

edited Sep 17 '17 at 14:41

answered Sep 17 '17 at 14:28

Zero

74,117
18
147
154

score 2 · Accepted Answer · edited Sep 17 '17 at 14:37

2

This could be an option:

dfjoin = pd.concat((df2.mul(y, axis=0) for _, y in df1.iteritems()), axis=1, keys=df1)
# This next line courtesy of MaxU's comment:
dfjoin.columns = dfjoin.columns.map('_'.join)
dfjoin
   a_c  a_d  b_c  b_d
0    3    4    6    8
1    3    4    6    8

edited Sep 17 '17 at 14:37

Zero

74,117
18
147
154

answered Sep 17 '17 at 14:07

erasmortg

3,246
1
17
34

Thanks, yes this get the columns' content correctly. But is it possible to also get the columns in a uni-dimensional index? – splinter Sep 17 '17 at 14:16
2

@splinter, try this `r.columns = r.columns.swaplevel().map('_'.join)`, where `r` is a resulting DF – MaxU - stand with Ukraine Sep 17 '17 at 14:19
@MaxU, precisely! – splinter Sep 17 '17 at 14:27

score 1 · Answer 3 · answered Sep 17 '17 at 14:57

1

Here's a NumPy approach using broadcasting working with the underlying array data with focus on performance efficiency -

out = (df2.values[:,None] * df1.values[:,:,None]).reshape(df1.shape[0],-1)
cols = [i+'_'+j for i in df1.columns for j in df2.columns]
df_out = pd.DataFrame(out, columns = cols)

answered Sep 17 '17 at 14:57

Divakar

218,885
19
262
358

1

this is brilliant ! – MaxU - stand with Ukraine Sep 17 '17 at 15:05
1

@MaxU Thanks for adding NumPy tag! – Divakar Sep 17 '17 at 15:06

Efficient product of columns in Pandas

3 Answers3