0

I am looking to merge a smaller and a bigger dataframe. Everytime I use merge() it returns the small dataframe, but I want the larger data frame where unknown "code" variables are NA.

Thanks!

bigdata <- data.frame(title = c("a", "b", "c", "d", "e", "f"))
#   code title
#1   NA     a
#2   NA     b
#3   NA     c
#4   NA     d
#5   NA     e
#6   NA     f

smalldata <- data.frame(code = rep(1,3), title = c("a", "b", "c"))
#   code title
#1    1     a
#2    1     b
#3    1     c

#I get the following output: 
merge(bigdata, smalldata by = "title")
#   title code
#1     a    1
#2     b    1
#3     c    1

#instead, I want something like this, with the large dataframe absorbing the codes from the #smalldata, and containing NA for unknown codes. 
#   title  code 
#1    a    1
#2    b    1
#3    c    1
#4    d    NA
#5    e    NA
#6    f    NA


camille
  • 16,432
  • 18
  • 38
  • 60
scarlett rouge
  • 339
  • 2
  • 7
  • 1
    Look at the docs for `merge` and the `all.x` and `all.y` arguments: " logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output." – camille Oct 06 '20 at 16:55
  • 1
    Have you tried to add `all = TRUE` to your merge? – c0bra Oct 06 '20 at 16:56
  • 1
    Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – camille Oct 06 '20 at 16:57

2 Answers2

0
> merge(bigdata, smalldata, by = "title", all = T)
  title code
1     a    1
2     b    1
3     c    1
4     d   NA
5     e   NA
6     f   NA

Using dplyr

> bigdata %>% left_join(smalldata)
Joining, by = "title"
  title code
1     a    1
2     b    1
3     c    1
4     d   NA
5     e   NA
6     f   NA
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
0

We can also do this with data.table

library(data.table)
setDT(bigdata)[smalldata, code := code, on = .(title)]

-output

bigdata
#   title code
#1:     a    1
#2:     b    1
#3:     c    1
#4:     d   NA
#5:     e   NA
#6:     f   NA
akrun
  • 874,273
  • 37
  • 540
  • 662