1

I have two datasets and want to merge them. How I add to first dataset only the lines that are in the second that are not in the first?

Only add to final dataset if the value not exists in the another dataset. An example dataset:

x = data.frame(id = c("a","c","d","g"), 
               value = c(1,3,4,7))
y = data.frame(id = c("b","c","d","e","f"),
               value = c(5,6,8,9,7))   

The merged dataset should look like (the order is not important):

a 1
b 5
c 3
d 4
e 9    
f 7
g 7
Renato Dinhani
  • 35,057
  • 55
  • 139
  • 199

3 Answers3

2

Using !, %in% and rbind:

rbind(x[!x$id %in% y$id,], y)
   id value
1   a     1
4   g     7
3   b     2
41  c     3
5   d     4
6   e     5
7   f     6
Luciano Selzer
  • 9,806
  • 3
  • 42
  • 40
2

For your example to work, you first need to ensure that id in each data.frame are directly comparable. Since they're factors, you need ensure they have the same levels/labels; or you can just convert them to character.

# convert factors to character
x$id <- as.character(x$id)
y$id <- as.character(y$id)
# merge
z <- merge(x,y,by="id",all=TRUE)
# keep first value, if it exists
z$value <- ifelse(is.na(z$value.x),z$value.y,z$value.x)
# keep desired columns
z <- z[,c("id","value")]
z
#   id value
# 1  a     1
# 2  b     5
# 3  c     3
# 4  d     4
# 5  e     9
# 6  f     7
# 7  g     7
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
1

You already answered your own question, but just didn't realize it right away. :)

> merge(x,y,all=TRUE)
  id value
1  a     1
2  c     3
3  c     6
4  d     4
5  d     8
6  g     7
7  b     5
8  e     9
9  f     7

EDIT

I'm a bit dense here and I'm not sure where you're getting at, so I provide you with a shotgun approach. What I did was I merged the data.frames by id and copied values from x to y if y` was missing. Take whichever column you need.

> x = data.frame(id = c("a","c","d","g"), 
+   value = c(1,3,4,7))
> y = data.frame(id = c("b","c","d","e","f"),
+   value = c(5,6,8,9,7))
> xy <- merge(x, y, by = "id", all = TRUE)
> xy
  id value.x value.y
1  a       1      NA
2  c       3       6
3  d       4       8
4  g       7      NA
5  b      NA       5
6  e      NA       9
7  f      NA       7
> find.na <- is.na(xy[, "value.y"])
> xy$new.col <- xy[, "value.y"]
> xy[find.na, "new.col"] <- xy[find.na, "value.x"]
> xy
  id value.x value.y new.col
1  a       1      NA       1
2  c       3       6       6
3  d       4       8       8
4  g       7      NA       7
5  b      NA       5       5
6  e      NA       9       9
7  f      NA       7       7

> xy[order(as.character(xy$id)), ]
  id value.x value.y new.col
1  a       1      NA       1
5  b      NA       5       5
2  c       3       6       6
3  d       4       8       8
6  e      NA       9       9
7  f      NA       7       7
4  g       7      NA       7
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • I edit the dataset I provided, there was an error. The id is the same, but the values are different for the same id. I want to merge only by the letter. – Renato Dinhani Dec 14 '11 at 14:11
  • I think this still is what the function does? You get multiple entries for "d"... – Paul Hiemstra Dec 14 '11 at 14:18
  • How do you deal with this if you have multiple such variables that have same name in the 2 data sets being merged. For eg. if you merge datasets in SAS, same name variables are combined into one variable easily. Is there any straightforward way to do it in R too? – Shreyes Jun 17 '13 at 17:52
  • @Shreyes care to post that as a question? Feel free to borrow any of the code in this answer. – Roman Luštrik Jun 17 '13 at 18:42