3

I have four data.frames which all have the same columns, being the first the same to all. In the variable columns there are some NAs.

First, I'd like to replace any value (which is not an NA) in each data.frame by the name of the data.frame. Second, I'd like to merge the data.frames. In this case, for each NA, there will be some other data.frame which would have a value for it, so that I'd end with every cell filled with values (or names of the data.frames).

Here's an example with two data.frames:

 >A
 name Q  W  E  R  T
 g1   NA NA 4  NA 0
 g2   3  2  NA 4  5
 g3   NA 1  NA 0  0
 g4   0  NA NA 1  9

 >B
 name Q  W  E  R  T
 g1   2  4  NA 1  NA
 g2   NA NA 5  NA NA
 g3   5  NA 0  NA NA
 g4   NA 6  4  NA NA

 >result
 name Q  W  E  R  T
 g1   B  B  A  B  A
 g2   A  A  B  A  A
 g3   B  A  B  A  A
 g4   A  B  B  A  A

I've tried some merge() and union() options differently. Also, I've tried to adapt answers to similar questions but I can't seem to solve this.

Creating a function to replace NAs from one data.frame with values from another

Merging data frames with missing values in R

Thank you in advance!

Community
  • 1
  • 1
afrendeiro
  • 2,423
  • 2
  • 23
  • 22

2 Answers2

4

This may not generalize well for you, but for the data provided...

A <- data.frame(Q=c(NA, 3, NA, 0),
                W=c(NA, 2, 1, NA),
                E=c(4, NA, NA, NA),
                R=c(NA, 4, 0, 1),
                T=c(0,5,0,9), row.names=paste0('g', 1:4), stringsAsFactors=FALSE)

B <- data.frame(Q=c(2, NA, 5, NA),
                W=c(4, NA, NA, 6),
                E=c(NA, 5, 0, 4),
                R=c(1, NA, NA, NA),
                T=c(NA, NA, NA, NA), row.names=paste0('g', 1:4), stringsAsFactors=FALSE)

The result will be "A" anywhere that is A is not NA. It will be "B" wherever B is not NA

result <- A
result[!is.na(A)] <- "A" 
result[!is.na(B)] <- "B"

#   Q W E R T
#g1 B B A B A
#g2 A A B A A
#g3 B A B A A
#g4 A B B A A
GSee
  • 48,880
  • 13
  • 125
  • 145
  • Thanks a lot! It works really good even on more complex data. The only thing is it introduces NAs in the first column (g1,g2...), but since the order is not changed it's just a matter of putting it back from one of the other data.frames. – afrendeiro Jul 23 '12 at 09:31
4

I've written a package for precisely this, as I am repeatedly sent partially overlapping excel files from people who are database illiterate.

I uploaded it to CRAN and it should be available shortly. What to do then is this

> install.packages("datamerge")
> library(datamerge)
> version.merge(A, B, add.values=TRUE)
Rows:  4 from `A` #1
       0 from `B` #2

Columns:
Q  Origin: `A` #1
   Imputed 2 values from `B` #2
W  Origin: `A` #1
   Imputed 2 values from `B` #2
E  Origin: `A` #1
   Imputed 3 values from `B` #2
R  Origin: `A` #1
   Imputed 1 values from `B` #2
T  Origin: `A` #1
   Q W E R T
g1 2 4 4 1 0
g2 3 2 5 4 5
g3 5 1 0 0 0
g4 0 6 4 1 9

If you want to try it before it's up on CRAN you can get it from http://www.anst.uu.se/chrba104/datamerge_1.0-1.tar.gz

Backlin
  • 14,612
  • 2
  • 49
  • 81
  • Thank you, really! I should invest a bit more in knowing packages like this. It works perfectly! Still, this would answer half (the biggest half, yes) of my question, because I'd like to change the contents as well, so with @GSee's answer I manage to do both. By the way, the package is already downloadable from CRAN. – afrendeiro Jul 23 '12 at 09:32
  • Also, another good point of this function is that you can merge several data.frames at once. Thanks! – afrendeiro Jul 23 '12 at 09:49
  • You're welcome! If it hadn't been for your question the package would still be lurking on my hard drive. – Backlin Jul 23 '12 at 15:53