Good day. Sorry, I've been trying to find the matches of the following 3 different tables in another single table that is the combination of the previous ones (I didn't put it here because the post was already very long, but it is literally the three previous ones pasted together one after another ). You see, I ran Blastp from one model organism against 3 others and now I would like to know which hit genes are shared among all the organisms.
#Frame 1 Hits Organism1
|OrganismoM |Organismo1 |
|gen_pep01 |hsa_pep01 |
|gen_pep01 |hsa_pep02 |
|gen_pep01 |hsa_pep03 |
|gen_pep03 |hsa_pep11 |
|gen_pep05 |hsa_pep20 |
#Frame 2 Hits Organism2
|OrganismoM |Organismo2 |
|gen_pep02 |rno_pep14 |
|gen_pep05 |rno_pep22 |
|gen_pep05 |rno_pep23 |
|gen_pep05 |rno_pep25 |
#Frame 3 Hits Organism3
|OrganismoM |Organismo3 |
|gen_pep01 |dre_pep01 |
|gen_pep03 |dre_pep08 |
|gen_pep08 |dre_pep99 |
What I am trying to obtain is a table that indicates the hits of each gene in each organism, something like this:
#Final frame
|OrganismM |Organism1 |Organism2 |Organism3 |
|gen_pep01 |hsa_pep01 |rno_pep01 |dre_pep01 |
|gen_pep01 |hsa_pep02 |rno_pep01 |dre_pep01 |
|gen_pep01 |hsa_pep03 |rno_pep01 |dre_pep01 |
|gen_pep02 |rno_pep14 |N/A |N/A |
|gen_pep03 |hsa_pep11 |dre_pep08 |N/A |
|gen_pep05 |hsa_pep20 |rno_pep22 |N/A |
|gen_pep05 |hsa_pep20 |rno_pep23 |N/A |
|gen_pep05 |hsa_pep20 |rno_pep25 |N/A |
|gen_pep08 |drep_pep99 |N/A |N/A |
But my current attempts with match
library(xlsx)
HitsOrganismMvsOrganismsGeneral<-read.xlsx("HitsOrganismoMvsOrganismosGeneral.xlsx",1) #Frame combination of the 3 frames
HitsOrganismMvsOrganism1<-read.xlsx("Frame1.xlsx",1) #Frame 1
MatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism1<-match(HitsOrganismMvsOrganismsGeneral$OrganismM,HitsOrganismMvsOrganism1$OrganismM)
IndexMatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism1<-!is.na(MatchOrganismMvsOrganismsGeneralVSOrganismMvsOrganism)
Index2MatchOrganismoMvsOrganismosGeneralVSOrganismoMvsOrganismo1<-OrganismoMvsOrganismo1$Organismo1[MatchOrganismoMvsOrganismosGeneralVSOrganismoMvsOrganismo1]
But my current match attempts give this, (note the "*"), basically repeating only the first gene that is matched
|OrganismM |Organism1 |Organism2 |Organism3 |
|gen_pep01 |*hsa_pep01*|rno_pep01 |dre_pep01 |
|gen_pep01 |*hsa_pep01*|rno_pep01 |dre_pep01 |
|gen_pep01 |*hsa_pep01*|rno_pep01 |dre_pep01 |
|gen_pep02 |rno_pep14 |N/A |N/A |
|gen_pep03 |hsa_pep11 |dre_pep08 |N/A |
|gen_pep05 |hsa_pep20 |*rno_pep22*|N/A |
|gen_pep05 |hsa_pep20 |*rno_pep22*|N/A |
|gen_pep05 |hsa_pep20 |*rno_pep22*|N/A |
|gen_pep08 |drep_pep99 |N/A |N/A |
Does anyone know how to fix this or know any other alternative method that you recommend? Many thanks for your time and have a great day.