0

I came across a merge behaviour that I do not understand.

Let's say that you want to merge a numer of data.frames by a shared column, keep the other columns and rename them using suffixes:

dt1 <- data.frame(ind = 1:10, var1 = 2:11, var2 = 3:12)
dt2 <- data.frame(ind = 1:10, var1 = 5:14, var2 = 6:15)
dt3 <- data.frame(ind = 1:10, var1 = 7:16, var2 = 8:17)
dt4 <- data.frame(ind = 1:10, var1 = 10:19, var2 = 11:20)

dt_all <- dt1 %>%
    merge(dt2, by = "ind", all = TRUE, suffixes = c(".1", ".2")) %>%
    merge(dt3, by = "ind", all = TRUE, suffixes = c("", ".3")) %>%
    merge(dt4, by = "ind", all = TRUE, suffixes = c("", ".4"))

This is what I get:

> head(dt_all, 1)
  ind var1.1 var2.1 var1.2 var2.2 var1 var2 var1.4 var2.4 
1   1      2      3      5      6    7    8     10     11 

I expected to get var1.3 var2.3 instead of var1 var2. The result does not change if I do not use the dplyr pipes or use data.table with or without seting keys for all the tables. Can anyone explain this?

Frank
  • 66,179
  • 8
  • 96
  • 180
MRau
  • 336
  • 1
  • 8
  • 2
    The column names should be the same for the 'x' and 'y' dataset for the `suffixes` to work. In the first `merge` it worked because both of them have the 'var1', 'var2' column. It is not the case after after the merge, now it is var1.1, var1.2 etc which is getting a nonmatch with var1 var2 from dt3 – akrun Nov 23 '17 at 09:11
  • Ah, ok, I get it. Is there a way to force the `suffixes` to work? – MRau Nov 23 '17 at 09:17

0 Answers0