How to merge two datasets by the different values in R?

Question

I have two datasets and want to merge them. How I add to first dataset only the lines that are in the second that are not in the first?

Only add to final dataset if the value not exists in the another dataset. An example dataset:

x = data.frame(id = c("a","c","d","g"), 
               value = c(1,3,4,7))
y = data.frame(id = c("b","c","d","e","f"),
               value = c(5,6,8,9,7))

The merged dataset should look like (the order is not important):

a 1
b 5
c 3
d 4
e 9    
f 7
g 7

Please provide an example dataset. This would make it much less work for us to provide you with an answer... — Paul Hiemstra, Dec 14 '11 at 13:39
Edited your question to make the example dataset pastable in R — Paul Hiemstra, Dec 14 '11 at 13:50
Could you simply rbind (or plyr::rbind.fill) the rows that are not in the data frame? This question is similar. http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in — Btibert3, Dec 14 '11 at 13:58
How do you get `b=2` and `g=6` in your desired result? Shouldn't that be `b=5` and `g=7`? — Joshua Ulrich, Dec 14 '11 at 15:21

Luciano Selzer · Answer 1 · 2011-12-14T14:22:08.920

2

Using !, %in% and rbind:

rbind(x[!x$id %in% y$id,], y)
   id value
1   a     1
4   g     7
3   b     2
41  c     3
5   d     4
6   e     5
7   f     6

edited Dec 14 '11 at 14:22

answered Dec 14 '11 at 13:57

Luciano Selzer

9,806
3
42
40

score 2 · Answer 2 · answered Dec 14 '11 at 15:27

For your example to work, you first need to ensure that id in each data.frame are directly comparable. Since they're factors, you need ensure they have the same levels/labels; or you can just convert them to character.

# convert factors to character
x$id <- as.character(x$id)
y$id <- as.character(y$id)
# merge
z <- merge(x,y,by="id",all=TRUE)
# keep first value, if it exists
z$value <- ifelse(is.na(z$value.x),z$value.y,z$value.x)
# keep desired columns
z <- z[,c("id","value")]
z
#   id value
# 1  a     1
# 2  b     5
# 3  c     3
# 4  d     4
# 5  e     9
# 6  f     7
# 7  g     7

Roman Luštrik · Answer 3 · 2011-12-14T16:03:16.873

You already answered your own question, but just didn't realize it right away. :)

> merge(x,y,all=TRUE)
  id value
1  a     1
2  c     3
3  c     6
4  d     4
5  d     8
6  g     7
7  b     5
8  e     9
9  f     7

EDIT

I'm a bit dense here and I'm not sure where you're getting at, so I provide you with a shotgun approach. What I did was I merged the data.frames by id and copied values from x to y if y` was missing. Take whichever column you need.

> x = data.frame(id = c("a","c","d","g"), 
+   value = c(1,3,4,7))
> y = data.frame(id = c("b","c","d","e","f"),
+   value = c(5,6,8,9,7))
> xy <- merge(x, y, by = "id", all = TRUE)
> xy
  id value.x value.y
1  a       1      NA
2  c       3       6
3  d       4       8
4  g       7      NA
5  b      NA       5
6  e      NA       9
7  f      NA       7
> find.na <- is.na(xy[, "value.y"])
> xy$new.col <- xy[, "value.y"]
> xy[find.na, "new.col"] <- xy[find.na, "value.x"]
> xy
  id value.x value.y new.col
1  a       1      NA       1
2  c       3       6       6
3  d       4       8       8
4  g       7      NA       7
5  b      NA       5       5
6  e      NA       9       9
7  f      NA       7       7

> xy[order(as.character(xy$id)), ]
  id value.x value.y new.col
1  a       1      NA       1
5  b      NA       5       5
2  c       3       6       6
3  d       4       8       8
6  e      NA       9       9
7  f      NA       7       7
4  g       7      NA       7

I edit the dataset I provided, there was an error. The id is the same, but the values are different for the same id. I want to merge only by the letter. — Renato Dinhani, Dec 14 '11 at 14:11
I think this still is what the function does? You get multiple entries for "d"... — Paul Hiemstra, Dec 14 '11 at 14:18
How do you deal with this if you have multiple such variables that have same name in the 2 data sets being merged. For eg. if you merge datasets in SAS, same name variables are combined into one variable easily. Is there any straightforward way to do it in R too? — Shreyes, Jun 17 '13 at 17:52
@Shreyes care to post that as a question? Feel free to borrow any of the code in this answer. — Roman Luštrik, Jun 17 '13 at 18:42

How to merge two datasets by the different values in R?

3 Answers3