0

Hello I would need to merge two dataframe

The first has Two column

df1

Groups Names
G1    SeqA
G1    Seq1(+)
G1    SeqC
G1    Seq2(-)
G1    Seq3(+)
G1    SeqB
G1    SeqG
G2    Seq8(-)
G2    SeqY
G2    SeqH
G3    Seq10(+)
G3    SeqK
G3    Seq12(-) 

and another one such as df2:

COL1    COL2  COL3
Seq1(+) SeqA  89
Seq2(-) SeqA  67
Seq3(+) SeqB  89
Seq2(-) SeqC  98
Seq8(-) SeqY  45
Seq8(-) SeqH  90
Seq10(+) SeqK 76
Seq11(-) SeqL 78
Seq12(-) SeqK 89

and the idea is to kind of merge the two dataframe in order to add Groups information into the df2.

And the idea would be to first check in df1 for each element in Names, if this element has a (+) or a (-),then check its corresponding Group and then go on the df2 and add at the column Group the corresponding Group.

Here for instance in df1 line 2 there is the Seq1(+), its Group is G1, then I look for the line in df2 where Seq1(+) is present in COL1 and I add G1 to the column Group.

So at the end I should get something like :

Group COL1 COL2 COL3
G1 Seq1(+) SeqA 89
G1 Seq2(-) SeqA 67
G1 Seq3(+) SeqB 89
G1 Seq2(-) SeqC 98
G2 Seq8(-) SeqY 45
G2 Seq8(-) SeqH 90 
G3 Seq10(+) SeqK 76
G3 Seq12(-) SeqK 89

As you can see Seq11(-) is not present in the last df merged because it was not present in the df1 for G3

Does anyone have an idea using python and pandas ?

1 Answers1

0
df3 = df1.merge(df2, how='inner', left_on='Names', right_on='COL1').drop('Names', axis=1)


Groups  COL1    COL2    COL3
0   G1  Seq1(+) SeqA    89
1   G1  Seq2(-) SeqA    67
2   G1  Seq2(-) SeqC    98
3   G1  Seq3(+) SeqB    89
4   G2  Seq8(-) SeqY    45
5   G2  Seq8(-) SeqH    90
6   G3  Seq10(+)    SeqK    76
7   G3  Seq12(-)    SeqK    89
warped
  • 8,947
  • 3
  • 22
  • 49