I am trying to compare 2 dataframes in R:
Keggs <- c("K001", "K002", "K003", "K004", "K005", "K006", "K007", "K008")
names <- c("Acaryochloris", "Proteobacteria", "Parvibaculum", "Alphaproteobacteria", "Rhodospirillum", "Magnetospirillum", "Coraliomargarita", "Bacteria")
family <- c("Proteos", "Cyanobacteria", "Rhizo", "Nostocales", "Bacteroidetes")
species <- c("Alphaproteobacteria", "Purrsia", "Parvibaculum", "Chico", "Rhodospirillum")
res <- data.frame(Keggs, names)
result <- data.frame(family, species)
Now, what I would like to do is to compare each string in the result$species with the res$names
.
If there is a match, I would like for it to return the string that is in result$family
of that same row, as well as the string that is in res$Keggs, as a separate dataframe.
Then end result would look like this:
> df3
Keggs family
K003 Rhizo
K004 Proteos
K005 Bacteroidetes
I have searched on how to compare data.frames in R and the closest I have found is this: compare df1 column 1 to all columns in df2 returning the index of df2
But this returns T/F and the res df is 2 columns.
In my searches I have ran into using the match()
and merge()
functions in base R, however; I am working with a "res" df that is 11,000,000 rows and my "result" df is less than 1,000 rows. In the match documentation it states: match(x, table, ...)
and under table: "long vectors are not supported" So, I don't think that the match() or merge() (due to the sheer size of my actual df's) approach is the most elegant. I have tried a loop, but I am limited in my loop skills and threw in the towel.
I would be incredibly grateful for any insights into this conundrum.
Thank you in advance, Purrsia