1

I actualy have 2 dataframes one is like:

seq1_id seq2_id dN  dS  Dist1 Dist_brute  kingdom
seq1    seq2    45  56  23    455         eucaryota
seq6    seq9    34  43  34    453         procaryota
seq3    seq98   32  34  21    90          Virus
seq21   seq87   32  12  35    211         Virus

and the other like:

seq1_id seq2_id dN  dS  Dist1 Dist_brute
seq1    seq2    45  56  23    455
seq4    seq12   78  45  32    789
seq3    seq98   32  34  21    90          
seq21   seq87   32  12  35    211 
seq45   seq90   21  23  12    123
seq6    seq9    34  43  34    453  

and what I would like to do is to get a new dataframe such:

seq1_id seq2_id dN  dS  Dist1 Dist_brute   kingdom
seq1    seq2    45  56  23    455          eucaryota
seq4    seq12   78  45  32    789          NaN
seq3    seq98   32  34  21    90           Virus
seq21   seq87   32  12  35    211          Virus
seq45   seq90   21  23  12    123          NaN
seq6    seq9    34  43  34    453          procaryota

Does someone have an idea? Thanks :)

jpp
  • 159,742
  • 34
  • 281
  • 339
Grendel
  • 555
  • 1
  • 4
  • 11
  • I think you should look at the answer [here](https://stackoverflow.com/questions/28097222/pandas-merge-two-dataframes-with-different-columns) – PyPingu May 25 '18 at 13:31

1 Answers1

1

For me working omit parameter on for merge by all columns with left join:

df = df2.merge(df1, how='left')

If need define columns for merge:

df = df2.merge(df1, on=['seq1_id','seq2_id','dN','dS','Dist1','Dist_brute'], how='left')

print (df)
  seq1_id seq2_id  dN  dS  Dist1  Dist_brute     kingdom
0    seq1    seq2  45  56     23         455   eucaryota
1    seq4   seq12  78  45     32         789         NaN
2    seq3   seq98  32  34     21          90       Virus
3   seq21   seq87  32  12     35         211       Virus
4   seq45   seq90  21  23     12         123         NaN
5    seq6    seq9  34  43     34         453  procaryota
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252