0

I'm attempting to merge two dataframes. One dataframe contains rownames which appear as values within a column of another dataframe. I would like to append a single column (Top.Viral.TaxID.Name) from the second dataframe based upon these mutual values, to the first dataframe.

The first dataframe looks like this:

         ERR1780367  ERR1780369  ERR2013703    xxx...    


374840      73          0            0                      
417290      56          57           20                      
1923444     57          20           102                     
349409      40          0            0                      
265522      353         401          22                       
322019      175         231          35                       

The second dataframe looks like this:

       Top.Viral.TaxID       Top.Viral.TaxID.Name


1        374840              Enterobacteria phage phiX174 sensu lato
2        417290              Saccharopolyspora erythraea prophage pSE211
3        1923444             Shahe picorna-like virus 14
4        417290              Saccharopolyspora erythraea prophage pSE211
5        981323              Gordonia phage GTE2
6        349409              Pandoravirus dulcis

However, I would also like to preserve the rownames of the first dataframe, so the result would look something like this:

         ERR1780367  ERR1780369  ERR2013703    xxx...    Top.Viral.TaxID.Name


374840      73          0            0                   Enterobacteria phage phiX174 sensu lato
417290      56          57           20                  Saccharopolyspora erythraea prophage pSE211            
1923444     57          20           102                 Shahe picorna-like virus 14     
349409      40          0            0                   Pandoravirus dulcis   
265522      353         401          22                  Hyposoter fugitivus ichnovirus     
322019      175         231          35                  Acanthocystis turfacea Chlorella virus 1    

Thanks in advance.

2 Answers2

1

I would strongly recommend against relying on rownames. They are embarrasingly often removed, and the function in dplyr/tidyr always strip them.

Always make the rownames a part of the data, i.e. use "tidy" data sets as in the example below

data(iris)
# We mix the data a bit, to check if rownames are conserved
iris = iris[sample.int(nrow(iris), 20),]
head(iris)

description = 
  data.frame(Species = unique(iris$Species))
description$fullname = paste("The wonderful", description$Species)
description

# .... the above are your data
iris = cbind(row = rownames(iris), iris)

# Now it is easy
merge(iris, description, by="Species")

And please, use reproducibly data when asking questions in SO to get fast answers. It is lot of work to reformat the data you presented into a form that can be tested.

Dieter Menne
  • 10,076
  • 44
  • 67
-1

Use sapply to loop through rownames of dataframe 1 (df1) and search the id in the dataframe 2 (df2), returning the description in the same row. Something like this

df1$Top.Viral.TaxID.Name <- sapply(rownames(df1), (function(id){
  df2$Top.Viral.TaxID.Name[df2$Top.Viral.TaxID == id]  
}))
Damiano Fantini
  • 1,925
  • 9
  • 11
  • Thanks for your reply. I'm still fairly new to R, so I am not exactly sure why the resulting values from running this solution appears to give me "integer(0)" for all values in df1$Top.Viral.TaxID.Name? – erichards52 Aug 12 '17 at 16:58
  • Sounds like "df2$Top.Viral.TaxID == id" is returning no matches. Can you share the data.frames? – Damiano Fantini Aug 12 '17 at 17:05