0

I have two dataframes, df1:

group    value
g1       A
g1       B
g1       C
g1       D
g2       B
g2       C
g2       E
g3       A
g3       D
g3       E
g4       B
g4       D
...

and df2:

value1   value2
A        B
A        C
B        C
B        D
...

I would like to merge df1 and df2 by merging columns value with both value1 and value2, namely:

group    value1    value2
g1       A         B
g1       A         C
g1       B         C
g2       B         C
g1       B         D
g4       B         D
...

As shown, group illustrates two columns value1 and value2 only if the two values simultaneously corresponds to the same group in df1.

df = df1.merge(df2, left_on='value', right_on='value1', how='inner')
df = df.merge(df2, left_on='value', right_on='value2', how='inner')
df = df.drop(['value'], axis=1)

The problem is, my actual dataframes df1 and df2 are quite large and every time I try to run the code, the session crashes (Not to mention, not quite sure if my code is correct to begin with).

Any idea how to optimize this process (and to confirm whether my approach is correct)?

jstaxlin
  • 517
  • 4
  • 18

1 Answers1

2

What is the need for the second merge? Isn't one enough?

df = df1.merge(df2, how='inner', left_on=['value'], right_on=['value1']).drop(columns=['value'])
print(df)

  group value1 value2
0    g1      A      B
1    g1      A      C
2    g3      A      B
3    g3      A      C
4    g1      B      C
5    g1      B      D
6    g2      B      C
7    g2      B      D
8    g4      B      C
9    g4      B      D
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • The problem is, if I do this, I would get some "incorrectly merged" rows. For example, the third and fourth rows (`g3`, `A`, `B`) (`g3`, `A`, `C`) are incorrect, as `A` corresponds to `g3` but neither `B` nor `C` in `df1`. – jstaxlin May 09 '20 at 14:31