2

I am wondering whether it is possible via a Pandas function to achieve the following. Given two Pandas DataFrames, get a new DataFrame whose columns are the Cartesian product of the columns in the two given DataFrames. That is, in a simple example, if we have the two DataFrames:

df1 = pd.DataFrame([[1,2], [1,2]], columns = ['a', 'b'])
df2 = pd.DataFrame([[3,4], [3,4]], columns = ['c', 'd'])

which look like

df1                 df2
   a  b                c  d
0  1  2             0  3  4
1  1  2             1  3  4

I am looking for a function that provides , without looping, the following:

df
   a_c  a_d  b_c  b_d
0  3    4    6    8
1  3    4    6    8
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
splinter
  • 3,727
  • 8
  • 37
  • 82

3 Answers3

3

You could do use pd.concat with add_prefix and mul df2 with df1's series(s).

In [806]: pd.concat([df2.mul(df1[c], axis=0).add_prefix(c+'_') for c in df1], axis=1)
Out[806]:
   a_c  a_d  b_c  b_d
0    3    4    6    8
1    3    4    6    8
Zero
  • 74,117
  • 18
  • 147
  • 154
2

This could be an option:

dfjoin = pd.concat((df2.mul(y, axis=0) for _, y in df1.iteritems()), axis=1, keys=df1)
# This next line courtesy of MaxU's comment:
dfjoin.columns = dfjoin.columns.map('_'.join)
dfjoin
   a_c  a_d  b_c  b_d
0    3    4    6    8
1    3    4    6    8
Zero
  • 74,117
  • 18
  • 147
  • 154
erasmortg
  • 3,246
  • 1
  • 17
  • 34
1

Here's a NumPy approach using broadcasting working with the underlying array data with focus on performance efficiency -

out = (df2.values[:,None] * df1.values[:,:,None]).reshape(df1.shape[0],-1)
cols = [i+'_'+j for i in df1.columns for j in df2.columns]
df_out = pd.DataFrame(out, columns = cols)
Divakar
  • 218,885
  • 19
  • 262
  • 358