0

I have two data frames like this

df1
      Entry           Sequence
0    A0A024QZ18    MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1    A0A024QZ42    MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf
2    A0A024QZB8    MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD
3    A0A024QZP7    MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR
4    A0A024QZX5    MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL
5    A0A024QZ33    MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH

df2

    Seq_id       number
0   A0A024QZ18     67
1   A0A024QZ33     45
2   A0A024QZ42     252
3   A0A024QZB8     35
4   A0A024QZP7     34
5   A0A024QZX5     54

I want to check which Entry in dataFrame df1 is there in Se Seq_id in df2 and if there are present, I want to print the Sequence in df1 as a new column in df2 infrot of similar id. If they are not present print ‘nan’.

Example answer:

    Seq_id       number   Sequence
0   A0A024QZ18     67     MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1   A0A024QZ33     45     MNSPGGRGKKKGSGGASNPVPPRPPP
2   A0A024QZ42     252    MAALSGGGGGGAEPGQALFNGDMEPEAG
3   A0A024QZB8     35     MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD...
4   A0A024QZP7     34     MARFGDEMPARYGGGGSGAAAGVVVGSGG
5   A0A024QZX5     54     MRPDRAEAPGPPAMAAGGPGAGSAAPVSS

I was trying whether they are in the prsent in the column as follows

df2.seq_id.isin(df_seq.Entry)

But i don't know how to print another column if they are similar, and gives nan if they are not.

bob
  • 75
  • 1
  • 8

1 Answers1

2

I think, simple left join will satisfy your requirements.

df1.merge(df2, how='left', left_on='Entry', right_on='Seq_id')

which will give you the output

     Entry                                        Sequence      Seq_id  number
 A0A024QZ18  MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ  A0A024QZ18      67
 A0A024QZ42   MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf  A0A024QZ42     252
 A0A024QZB8  MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD  A0A024QZB8      35
 A0A024QZP7  MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR  A0A024QZP7      34
 A0A024QZX5  MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL  A0A024QZX5      54
 A0A024QZ33  MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH  A0A024QZ33      45
Prince Francis
  • 2,995
  • 1
  • 14
  • 22
  • Thank you for your help and this is great. Sorry my dataframes are big and I have given a little here,There will be non matching ones. I want to get 'nan' for those. Can you help in this – bob Dec 17 '19 at 09:43
  • If there is a non matching row, it will be automatically filled as `NaN`. – Prince Francis Dec 17 '19 at 09:44