0

Good day. Sorry, I've been trying to find the matches of the following 3 different tables in another single table that is the combination of the previous ones (I didn't put it here because the post was already very long, but it is literally the three previous ones pasted together one after another ). You see, I ran Blastp from one model organism against 3 others and now I would like to know which hit genes are shared among all the organisms.

#Frame 1 Hits Organism1 
|OrganismoM |Organismo1 |
|gen_pep01  |hsa_pep01  |
|gen_pep01  |hsa_pep02  |
|gen_pep01  |hsa_pep03  |
|gen_pep03  |hsa_pep11  |
|gen_pep05  |hsa_pep20  |

#Frame 2 Hits Organism2 
|OrganismoM |Organismo2 |
|gen_pep02  |rno_pep14  |
|gen_pep05  |rno_pep22  |
|gen_pep05  |rno_pep23  |
|gen_pep05  |rno_pep25  |

#Frame 3 Hits Organism3   
|OrganismoM |Organismo3 |
|gen_pep01  |dre_pep01  |
|gen_pep03  |dre_pep08  |
|gen_pep08  |dre_pep99  |

What I am trying to obtain is a table that indicates the hits of each gene in each organism, something like this:

#Final frame
|OrganismM  |Organism1  |Organism2  |Organism3  |
|gen_pep01  |hsa_pep01  |rno_pep01  |dre_pep01  |
|gen_pep01  |hsa_pep02  |rno_pep01  |dre_pep01  |
|gen_pep01  |hsa_pep03  |rno_pep01  |dre_pep01  |
|gen_pep02  |rno_pep14  |N/A        |N/A        |
|gen_pep03  |hsa_pep11  |dre_pep08  |N/A        |
|gen_pep05  |hsa_pep20  |rno_pep22  |N/A        |
|gen_pep05  |hsa_pep20  |rno_pep23  |N/A        |
|gen_pep05  |hsa_pep20  |rno_pep25  |N/A        |
|gen_pep08  |drep_pep99 |N/A        |N/A        |

But my current attempts with match

library(xlsx)
HitsOrganismMvsOrganismsGeneral<-read.xlsx("HitsOrganismoMvsOrganismosGeneral.xlsx",1) #Frame combination of the 3 frames
HitsOrganismMvsOrganism1<-read.xlsx("Frame1.xlsx",1) #Frame 1
MatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism1<-match(HitsOrganismMvsOrganismsGeneral$OrganismM,HitsOrganismMvsOrganism1$OrganismM)
IndexMatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism1<-!is.na(MatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism)
Index2MatchOrganismoMvsOrganismosGeneralVSOrganismoMvsOrganismo1<-OrganismoMvsOrganismo1$Organismo1[MatchOrganismoMvsOrganismosGeneralVSOrganismoMvsOrganismo1]

But my current match attempts give this, (note the "*"), basically repeating only the first gene that is matched

|OrganismM  |Organism1  |Organism2  |Organism3  |
|gen_pep01  |*hsa_pep01*|rno_pep01  |dre_pep01  |
|gen_pep01  |*hsa_pep01*|rno_pep01  |dre_pep01  |
|gen_pep01  |*hsa_pep01*|rno_pep01  |dre_pep01  |
|gen_pep02  |rno_pep14  |N/A        |N/A        |
|gen_pep03  |hsa_pep11  |dre_pep08  |N/A        |
|gen_pep05  |hsa_pep20  |*rno_pep22*|N/A        |
|gen_pep05  |hsa_pep20  |*rno_pep22*|N/A        |
|gen_pep05  |hsa_pep20  |*rno_pep22*|N/A        |
|gen_pep08  |drep_pep99 |N/A        |N/A        |

Does anyone know how to fix this or know any other alternative method that you recommend? Many thanks for your time and have a great day.

Phil
  • 7,287
  • 3
  • 36
  • 66
Luis T
  • 11
  • 1
  • Hi and welcome Luis, please read [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/12242625) and update your question. And are you really sure you want to have variable names like `Index2MatchOrganismoMvsOrganismosGeneralVSOrganismoMvsOrganismo1`? It's max confusing from my point of view. – Marco_CH Jan 18 '22 at 16:57
  • Whats the logic to include `gen_pep01` in Organismo2 as `rno_pep01 ` (line 1-3) in the final frame? – Andre Wildberg Jan 18 '22 at 16:59
  • Oh I just noticed, that part is wrong, it has nothing to do with it, rno_pep01 doesn't even exist in the tables example, thanks for noticing. I will upload a more simplified question, with the specific function that I would like to know. Basically I want a function that works like match but finds all matches related to a value, not just the first one. – Luis T Jan 19 '22 at 16:56

0 Answers0