R: how to fill in missing value with another dataset effeciently?

Question

Two input datasets:

A <- data.frame(id = c(1, 2, 3), value = rep(NA, 3))
A 
     id value
  <dbl> <lgl>
1     1    NA
2     2    NA
3     3    NA

B <- data.frame(id = c(3, 2), value = c(3, 2))
B
  id value
1  3     3
2  2     2

After adding in available value in B to A, it's expected to have:

A 
     id value
  <dbl> <lgl>
1     1    NA
2     2    2
3     3    3

It can be achieved with following for loop. However, for-loop is in general very slow. How to do it more efficiently?

for(i in 1:nrow(A)){
  item <- A[i,]
  print(item)
  if(is.na(item$value) && (item$id %in% B$id)){
    A[i, "value"] <- B[B$id == item$id,]$value
  }
}

Join can solve this problem. but requiring a rule to resolve the conflict.

@nrussell, this needs B to overwrite A. which type of join does it belong to? — HappyCoding, Feb 03 '17 at 15:22
Left join then coalesce --`left_join(A, B, by = "id") %>% mutate(value = coalesce(value.x, value.y)) %>% select(id, value)`. — nrussell, Feb 03 '17 at 15:22
Here is a base R method with `match`: `A$value[A$id %in% B$id] <- B$value[match(A$id, B$id)[!is.na(match(A$id, B$id))]]` that works with the example. — lmo, Feb 03 '17 at 15:44

Wietze314 · Answer 1 · 2017-02-03T15:25:50.130

0

You can use a join (dplyr):

library(dplyr)

A <- data.frame(id = c(1, 2, 3), value = rep(NA, 3))
B <- data.frame(id = c(3, 2), value = c(3, 2))

A %>% left_join(B, by='id') %>%
  mutate(value = ifelse(is.na(value.x),value.y,value.x))

see comment of your question to learn what joining data is all about.

edited Feb 03 '17 at 15:25

answered Feb 03 '17 at 15:21

Wietze314

5,942
2
21
40

thanks. mutate works well with join for this case. – HappyCoding Feb 03 '17 at 15:24
The `ifelse(is.na(value.x),value.y,value.x)` can also be achieved by `coalesce(value.x, value.y)`. – Brian Stamper Jun 05 '20 at 17:44

R: how to fill in missing value with another dataset effeciently?

1 Answers1