7

I have a main table(a), containing column: id, age, and sex. eg.

a <- data.frame(id=letters[1:4], age=c(18,NA,9,NA), sex=c("M","F","F","M"))
  id age sex
1  a  18   M
2  b  NA   F
3  c   9   F
4  d  NA   M

And I have a supplement table(b), just containing all the missing data in table(a) or duplicated data in table(a). eg.

b <- data.frame(id=c("a","b","d"), age=c(18,32,20))
  id age
1  a  18
2  b  32
3  d  20

Now I want to merge the two table, like this:

  id age sex
1  a  18   M
2  b  32   F
3  c   9   F
4  d  20   M

However, I'd tried merge(a,b,by="id",all=T). The result is not what I want. Is there any way to solve this problem? Thank you!

Eric Chang
  • 2,580
  • 4
  • 19
  • 19

2 Answers2

4

We can use data.table

library(data.table)
setDT(a)[b, agei := i.age, on='id'][is.na(age), age := agei][,agei:= NULL][]
a
 #  id age sex
#1:  a  18   M
#2:  b  32   F
#3:  c   9   F
#4:  d  20   M
akrun
  • 874,273
  • 37
  • 540
  • 662
  • akrun, I recently posted a [question](https://stackoverflow.com/questions/46761065/join-and-overwrite-data-in-one-table-with-data-from-another-table) about how this update-only-NAs operation can be accomplished using all of the columns in the second data table (b in this example). In the OP's case, it only concerns the age column. Would you care to take a stab at that either in this post or my post? I like data.table solutions. – AlexR Oct 17 '17 at 21:44
  • @AlexR Looks like Frank provided some solution there – akrun Oct 18 '17 at 01:58
  • He did, but the solution involved melting data, which coerces all variables to the same type. I was hoping there might be something less "invasive" and looked forward to a diversity of opinions. In any event, thanks for checking it out. – AlexR Oct 18 '17 at 03:35
2

Here is a dplyr solution:

library(dplyr)

c <- left_join(a,b, by = "id") %>% # this will generate age.x and age.y
  mutate(age = ifelse(is.na(age.x), age.y, age.x)) %>% # we generate a joint 'age' variable
  select(-age.y, -age.x) # drop the superfluous columns

> c
  id sex age
1  a   M  18
2  b   F  32
3  c   F   9
4  d   M  20

Note that this will throw you a warning that you try to join on factor levels. This is because the example data in the reproducible example was generated with stringsAsFactors = T.

Felix
  • 1,611
  • 13
  • 22