0

How do i horizontally merge/combine 3 dataframes in R? I have three dataframes that have a word in one column, and the word count extracted from a text in the next, kind of like this:

  word.        count
1 hello.         6
2 test.          3
3 how.           8
4 are.           4
5 you.           1

and let's say dataframe 2:

  word.        count
1 hello.         6
2 test.          3
3 i.             3
4 am.            6
5 good.          2

how do i merge them like this:

  word.         df1.     df2.      total
1 hello.         6.       6.         12
2 test.          3        3           6
3 how.           8        0.          8 
4 are.           4        0           4
5 you.           1        0           1
6 i              0        3           3
7 am             0        6           6
8 good           0        2           2

instead of 2 dataframes, i thus have 3

thanks!

2 Answers2

0

Assuming that each has the same joining column:

out <- Reduce(function(a,b) merge(a, b, by = "word.", all = TRUE),
              list(df1, df2))
# normally this has `NA` in not-shared words, convert these NAs to 0
out[,-1] <- lapply(out[,-1], function(a) replace(a, is.na(a), 0))
out
#    word. count.x count.y
# 1    am.       0       6
# 2   are.       4       0
# 3  good.       0       2
# 4 hello.       6       6
# 5   how.       8       0
# 6     i.       0       3
# 7  test.       3       3
# 8   you.       1       0

However many frames you have, put them all in a list and this just works. (See https://stackoverflow.com/a/24376207/3358272 for discussion of "list of frames" efficiencies.)

All you need to do now is change the column names. (Many techniques for doing this.)


Data

df1 <- structure(list(word. = c("hello.", "test.", "how.", "are.", "you."), count = c(6L, 3L, 8L, 4L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
df2 <- structure(list(word. = c("hello.", "test.", "i.", "am.", "good."), count = c(6L, 3L, 3L, 6L, 2L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Most R functions that do merge, or using a more technical term, do a join operation between tables, are designed to be used on two tables at a time. In base R, we have the merge function for these join operation's. Since you have more than two tables to merge, even with this behaviour, you can easily replicate te join's accross your tables, with the Reduce function. Like defined below:

Here are the input data:

word <- c("hello.", "test.", "how.", "are.", "you.") 

df1 <- data.frame(
  word = word,
  count = 11:15
)

set.seed(1)
df2 <- data.frame(
  word = sample(word, size = 8, replace = T),
  value2 = rnorm(8)
)

set.seed(1)
df3 <- data.frame(
  word = word[c(3,4)],
  value3 = rnorm(2)
)

Here are the operation of join:

list_dfs <- list(df1, df2, df3)

multi_inner <- Reduce(
  function(x, y, ...) merge(x, y, by = "word", all = TRUE), 
  list_dfs
)

Here is the result:

    word count       value2     value3
1   are.    14 -0.294720447  0.1836433
2 hello.    11 -0.928567035         NA
3 hello.    11 -0.005767173         NA
4   how.    13 -0.799009249 -0.6264538
5   how.    13 -0.289461574 -0.6264538
6  test.    12  2.404653389         NA
7  test.    12 -1.147657009         NA
8   you.    15  0.763593461         NA

Now, in this example, I am considering that you want to save all the possible combinations between the 3 tables. If you just want to identify, the matches (or the words in the word column) that appears in all of your tables, you need to set the all argument to FALSE. Like this:

multi_inner <- Reduce(
  function(x, y, ...) merge(x, y, by = "word", all = FALSE), 
  list_dfs
)

Resulting this:

  word count     value2     value3
1 are.    14 -0.2947204  0.1836433
2 how.    13 -0.7990092 -0.6264538
3 how.    13 -0.2894616 -0.6264538
Pedro Faria
  • 707
  • 3
  • 7