How to expand values in a dataframe from another dataframe

Question

For an example consider two dataframes A and B

Dataframe A:

Name1 | Col1 | Col2
------+------+-----
 A    |  1   | 2
 B    |  4   | 3
 C    |  6   | 8

and Dataframe B:

Name2 | Col3 | Col4
------+------+-----
 P    |  5   |  9
 Q    |  0   |  1
 R    |  2   |  7

and I wish to convert them into one dataframe like

New Dataframe:

Name1 | Col1 | Col2 | Name2 | Col3 | Col4
------+------+------+-------+------+------
 A    |  1   |  2   |  P    |  5   | 9
 A    |  1   |  2   |  Q    |  0   | 1
 A    |  1   |  2   |  R    |  2   | 7
 B    |  4   |  3   |  P    |  5   | 9
 B    |  4   |  3   |  Q    |  0   | 1
 B    |  4   |  3   |  R    |  2   | 7
 C    |  6   |  8   |  P    |  5   | 9
 C    |  6   |  8   |  Q    |  0   | 1
 C    |  6   |  8   |  R    |  2   | 7

How can I change it ?

Umm I need to do this thing to prepare upon my data that I am building up for my ML model @ShubhamSharma. — Tavish Jain, Jun 20 '20 at 16:08
But why the keys are replicated three times only, there should be some logic here? — Shubham Sharma, Jun 20 '20 at 16:11
@ShubhamSharma the name actually represent github commit hashes and the files. It's quite complicated, but as I said, it is all gonna go in for a ML model — Tavish Jain, Jun 20 '20 at 16:15
Ohhh Cartesian Product !!!! That's the word I was looking for. And I guess yes that's what I was exactly looking for. Thanks Man — Tavish Jain, Jun 20 '20 at 16:22

score 2 · Answer 1 · answered Jun 20 '20 at 16:12

df1['key'] = 0
df2['key'] = 0
new_df = pd.merge(df1, df2, on='key').drop('key', axis=1)
print(new_df)

Prints:

  Name1  Col1  Col2 Name2  Col3  Col4
0     A     1     2     P     5     9
1     A     1     2     Q     0     1
2     A     1     2     R     2     7
3     B     4     3     P     5     9
4     B     4     3     Q     0     1
5     B     4     3     R     2     7
6     C     6     8     P     5     9
7     C     6     8     Q     0     1
8     C     6     8     R     2     7

score 0 · Answer 2 · answered Jun 20 '20 at 16:17

What you could do is something like this

A2 = A.copy()
B2 = B.copy()

DUMMY = 'dummy'
A2[DUMMY] = 1
B2[DUMMY] = 1

df = pd.merge(A2,B2)
df = df.loc[:,df.columns != DUMMY]

The reason it works is that merge joins on the common 'dummy' column on all values where they match. Given that dummy is constant, all the combinations match.

score 0 · Answer 3 · answered Jun 20 '20 at 16:21

Here's an answer using data transformation with numpy, the basic idea is to duplicate rows in df1 as many times as rows in df2.

n_cols = df1.shape[1]
vals = [np.hstack([np.repeat(df1.iloc[i,:], n_cols).values.reshape(-1, n_cols), df2.values]) 
        for i in range(len(df1))]
f = pd.DataFrame(np.vstack(vals), columns=df1.columns.tolist() + df2.columns.tolist())

print(f)

  Name1 Col1 Col2 Name2 Col3 Col4
0     A    A    A     P    5    9
1     1    1    1     Q    0    1
2     2    2    2     R    2    7
3     B    B    B     P    5    9
4     4    4    4     Q    0    1
5     3    3    3     R    2    7
6     C    C    C     P    5    9
7     6    6    6     Q    0    1
8     8    8    8     R    2    7

How to expand values in a dataframe from another dataframe

3 Answers3