1

I have two large dataframes. A minimum, reproducible example of them looks like this:

A <- data.frame(A=c("a","b","c","d"), B=c(1,2,3,4), C=c(1,2,NA,NA), D=c(1,2,3,4))
A

  A B  C D
1 a 1  1 1
2 b 2  2 2
3 c 3 NA 3
4 d 4 NA 4

B <- data.frame(A=c("c","d"), B=c(3,4), C=c(3,4))
B

  A B C
1 c 3 3
2 d 4 4

For every row with a NA in A, I have a corresponding row in B with the replacement of the missing value. I would like to merge the two dataframes A and B to a "common" dataframe AB in a way that the NA's in dataframe A, column C are replaced by their corrsponding value in dataframe B, column C. The result should look like this:

AB <- data.frame(A=c("a","b","c","d"), B=c(1,2,3,4), C=c(1,2,3,4), D=c(1,2,3,4))
AB

  A B C D
1 a 1 1 1
2 b 2 2 2
3 c 3 3 3
4 d 4 4 4

The "closest" (not so close either) I got to the solution was with the following code:

AB <- merge(A,B, all.x = TRUE)
AB

  A B  C D
1 a 1  1 1
2 b 2  2 2
3 c 3 NA 3
4 d 4 NA 4

Which, obviously, just uses the variables from A. I have already consulted the follwing questions:

Please consider that the real dataframes are much larger. If you need any further information, please let me know. Thanks in advance!

3 Answers3

1

Using the data.table-package, you can perform an update-join, which should run fast on large datasets.

library(data.table)
#set A and B as data.table
setDT(A);setDT(B)
#update col C in data.table A with col C from data.table B, join by cols A and B
A[ B, C := i.C, on = .( A, B) ]

output

#    A B C D
# 1: a 1 1 1
# 2: b 2 2 2
# 3: c 3 3 3
# 4: d 4 4 4
Wimpel
  • 26,031
  • 1
  • 20
  • 37
1

You could do something like this in base:

index <- match(B$A, A$A) 

A$C[index] <- B$C

# A B C D
#1 a 1 1 1
#2 b 2 2 2
#3 c 3 3 3
#4 d 4 4 4
Matt
  • 2,947
  • 1
  • 9
  • 21
-1

rbind(data.frame(na.omit(A)), B)

Clemsang
  • 5,053
  • 3
  • 23
  • 41
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • It's helpful if you can explain what this does and why it works, but it seems like it misses the purpose of what the OP wants—they aren't looking for just a row-binding, since that will miss the issue of replacement – camille Nov 04 '19 at 16:33