4

EDIT: I understand the question of merging multiple data frames has been asked previously, but I was stuck on how to merge multiple tables without converting to data frames first, to be more concise. If you want to see how to merge multiple data frames, please see the great answer here (also linked below).

So I have a way to merge multiple data frames in R already, but I am hoping someone can help me find a way that's more elegant. Here is an example of code that I have that works. Given that df1, df2, and df3 are data frames with identical columns (including column name 'class') but differing numbers of rows, I can do:

table1 <- table(df1$class)
table2 <- table(df2$class)
table3 <- table(df3$class)

and as given by this answer, I can then merge them:

merged.table <- Reduce(function(...) merge(..., all=T), list(table1, table2, table3))

My problem is that the merge is incorrect because the objects table1, table2, and table3 have identical identification names and the merged.table ends up combining the data to one column.

My workaround is to convert the tables to data frames, as such:

table1 <- as.data.frame(table(df1$class))
colnames(table1) <- c("ID","counts1")
table2 <- as.data.frame(table(df2$class))
colnames(table2) <- c("ID","counts2")
table3 <- as.data.frame(table(df3$class))
colnames(table3) <- c("ID","counts3")

Then the merge works just fine. But let me tell you, that gets really clunky and tedious after a while, and I need to do this kind of thing a lot.

Is there any way to achieve the same goal without converting tables to data frames and assigning column names?

Here is an example of what the data frame looks like, truncated for simplicity:

transcript <- rep(c("a","b","c","d","e","f"))
family <- rep(c("L1","L2","ERV"),2)
class <- rep(c("LINE","LINE","LTR"),2)

df1 <- data.frame(transcript, family, class)

transcript  family  class
a            L1     LINE
b            L2     LINE
c            ERV    LTR
d            L1     LINE
e            L2     LINE
f            ERV    LTR
Community
  • 1
  • 1
lmrta
  • 305
  • 1
  • 2
  • 13
  • Great! The `by="Var1"` trick worked great. Could you leave it as an answer so I can accept it? – lmrta Jul 25 '16 at 13:12

1 Answers1

5

We will need to add by = "Var1" argument to merge:

# dummy data
transcript <- rep(c("a","b","c","d","e","f"))
family <- rep(c("L1","L2","ERV"),2)
class <- rep(c("LINE","LINE","LTR"),2)
df1 <- data.frame(transcript, family, class)

# get table as data.frame
table1 <- as.data.frame(table(df1$class))
table2 <- as.data.frame(table(df1$class))
table3 <- as.data.frame(table(df1$class))

# merge without by
Reduce(function(...) merge(..., all = TRUE),
       list(table1, table2, table3))
#   Var1 Freq
# 1 LINE    4
# 2  LTR    2

# merge with by = "Var1"
Reduce(function(...) merge(..., all = TRUE, by = "Var1"),
       list(table1, table2, table3))

#   Var1 Freq.x Freq.y Freq
# 1 LINE      4      4    4
# 2  LTR      2      2    2
zx8754
  • 52,746
  • 12
  • 114
  • 209