3

I have data like this in R:

subjID = c(1,2,3,4)
var1 = c(3,8,NA,6)
var1.copy = c(NA,NA,5,NA)
fake = data.frame(subjID = subjID, var1 = var1, var1 = var1.copy)

which looks like this:

> fake
  subjID var1 var1.1
1      1    3     NA
2      2    8     NA
3      3   NA      5
4      4    6     NA

Var1 and Var1.1 represent the same variable, so each subject has NA for one column and a numerical value in the other (no one has two NAs or two numbers). I want to merge the columns to get a single Var1: (3, 8, 5, 6).

Any tips on how to do this?

3 Answers3

3

If you're only dealing with two columns, and there are never two numbers or two NAs, you can calculate the row mean and ignore missing values:

fake$fixed <- rowMeans(fake[, c("var1", "var1.1")], na.rm=TRUE)
Andrew
  • 36,541
  • 13
  • 67
  • 93
2

You can use is.na, which can be vectorised as:

# get all the ones we can from var1
var.merged = var1;
# which ones are available in var1.copy but not in var1?
ind = is.na(var1) & !is.na(var1.copy);
# use those to fill in the blanks
var.merged[ind] = var1.copy[ind];
Cramer
  • 1,785
  • 1
  • 12
  • 20
  • Elegant solution! Thank you. (I had to use square brackets in the last line, though, to prevent R from trying to call var.merged and var1.copy as functions.) – Maya Mathur Mar 06 '12 at 03:07
  • Thanks for the catch, I'm switching between R and MATLAB and my brain gets confused sometimes. – Cramer Mar 06 '12 at 06:15
2

It depends on how you want to merge if there are conflicts.

You could simply put all non-NA values in var.1.1 into the corresponding slot of var1. In case of conflicts, this will favour var.1.1.

var1[!is.na(var1.copy)] <- var1.copy[!is.na(var1.copy)]

You could just fill in all NA values in var1 with corresponding values of var1.1. In case of conflict, this will favour var1.

var1[is.na(var1)] <- var1.copy[is.na(var1)]
mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194