16

In the following example

x <- data.frame(code = 7:9, food = c('banana', 'apple', 'popcorn'))
y <- data.frame(food = c('banana', 'apple', 'popcorn'),
                isfruit = c('fruit', 'fruit', 'not fruit'))

I would like to do x <- merge(x, y), but the problem is that merge() reorders the columns so that the by column (food) comes first. How can I prevent this and have merge(x, y) use the same column order of x and just insert the new variable (isFruit) as the third column (i.e., "code, food, isFruit" instead of "food, code, isFruit")?

I've tried this, to no avail:

merge(x, y, sort = F)

My workaround is to do this afterward

x <- x[c(2, 1, 3)]
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107

4 Answers4

33

Here's a generic version of your base workaround:

merge(x, y)[, union(names(x), names(y))]
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Note that passing in `union` for col names only works on data frames - not data tables. – Serenthia Apr 08 '20 at 22:57
  • 4
    @Serenthia add `, with = FALSE` for `data.table` – eddi Apr 09 '20 at 16:26
  • 1
    This fails when merging using "by.x=" and "by.y=" and columns which differ in names. An ugly version of the above workaround is (using modified example data by OP with "food" and "food2" as merging columns): `x <- data.frame(code = 7:9, food = c('banana', 'apple', 'popcorn')) ; y <- data.frame(food2 = c('banana', 'apple', 'popcorn'), isfruit = c('fruit', 'fruit', 'not fruit')) ; byY="food2" ; merge(x, y, by.x="food",by.y=byY)[, union(names(x), subset(names(y), names(y)!=byY))]` – al-ash Mar 16 '21 at 07:36
13

plyr makes this easy:

 x <- data.frame(code = 7:9, food = c('banana', 'apple', 'popcorn'))
 y <- data.frame(food = c('banana', 'apple', 'popcorn'),
                isfruit = c('fruit', 'fruit', 'not fruit'))

library(plyr)
join(x,y)

        #GOOD 
#Joining by: food
#  code    food   isfruit
#1    7  banana     fruit
#2    8   apple     fruit
#3    9 popcorn not fruit

    #BAD  
# merge(x,y)
#     food code   isfruit
#1   apple    8     fruit
#2  banana    7     fruit
#3 popcorn    9 not fruit
user1317221_G
  • 15,087
  • 3
  • 52
  • 78
6

You can wrap it in your custom function. For example :

merge.keep <- function(...,ord=union(names(x), names(y)))merge(...)[ord]

then for example:

merge.keep(x,y)
  code    food   isfruit
1    8   apple     fruit
2    7  banana     fruit
3    9 popcorn not fruit

EDIT I use @Eddi idea to set default values of ord.

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 2
    -1 because this doesn't add anything to the OP - what OP wants is **not** having to specify the order manually – eddi Jul 10 '13 at 19:11
  • 1
    @eddi good catch even I disagree with the downvote( I find it a little bit sharp). My idea was to create a function that extend merge functions. I edit my answer using your good idea to set default values of order. – agstudy Jul 10 '13 at 23:19
  • @agstudy hello, this solution does not work if I have data tables with different names (e.g. x = DT1, y = DT2) and if I merge on different columns on these data tables. Err msg: Error in as.vector(x) : object 'x' not found. Makes sense, because the data tables are not named x and y to begin with. I tried to find a workaround, but without success. Do you know how to adapt the solution that it works more generally on (more complicated) data table merges? – EDC Oct 21 '17 at 17:12
2

If you only bring in one column and want to append it last then maybe merge is overkill and you can just do an assingment with a match-[indexing approach:

> x$isfruit <- y$isfruit[match(y$food, x$food)]
> x
  code    food   isfruit
1    7  banana     fruit
2    8   apple     fruit
3    9 popcorn not fruit

(There are no switches to throw in the merge function to do what you ask.)

IRTFM
  • 258,963
  • 21
  • 364
  • 487