0

Here is some sample data.

vv  var1    var2
1   a   1/1/2010
1   c   1/3/2010
2   d   1/6/2010
3   a   1/8/2010
3   c   1/9/2010
4   a   1/10/2010
4   b   1/11/2010
5   d   1/13/2010
6   a   1/16/2010
6   b   1/17/2010
7   a   1/19/2010
7   b   1/20/2010
8   d   1/22/2010
9   a   1/25/2010
9   c   1/27/2010

I am trying to create new variables, populated by responses from other variables. I thought this to be easy enough. For example, I tried something like below.

data$new1[data$var1=="a"]<-data$var2
#or
data$new1[data$var1=="b" | data$var1=="c"]<-data$var2

I get the error number of items to replace is not of replacement length. In my data, not every var1=="a" has a var2 value, so I am not sure why r is not just assigning NA's for missing values, which is something that I am okay with (rather prefer it actually). Basically, I want r to give new1 NA values for any var1!="a".

I also tried

if (data$var1=="a") {data$new1<-data$var2} else {data$new1<-"NA"}

but I get the error the condition has length > 1 and only the first element will be used.

Now, I think I can subset my data to only have data with var1=="a", and then assign my values, and then just merge back into the main data set with the all=T option to get the NA's imputed, but I really want to avoid doing this.

I am not really sure what the problem is. Any advice greatly appreciated. Cheers.

Output from methods below for running the code:

data$new1 <- ifelse(data$var1 %in% c("b","c"),data$var2,NA)

       vv var1      var2 new1
    1   1    a  1/1/2010   NA
    2   1    c  1/3/2010   12
    3   2    d  1/6/2010   NA
    4   3    a  1/8/2010   NA
    5   3    c  1/9/2010   15
    6   4    a 1/10/2010   NA
    7   4    b 1/11/2010    3
    8   5    d 1/13/2010   NA
    9   6    a 1/16/2010   NA
    10  6    b 1/17/2010    6
    11  7    a 1/19/2010   NA
    12  7    b 1/20/2010    8
    13  8    d 1/22/2010   NA
    14  9    a 1/25/2010   NA
    15  9    c 1/27/2010   11
user27008
  • 600
  • 3
  • 15
  • 24
  • 1
    Welcome to Stack Overflow! Please add reproducible sample for good people here to help you. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – CHP Mar 11 '13 at 16:56

2 Answers2

1

since you haven't given sample data, this code is untested

data$new1 <- rep(NA, nrow(data))
data$new1[data$var1=="A"] <- data$var2[data$var1=="A"]

Key here is to have same subsetting paramter (data$var1=="A") for both data$new1 and data$var2.

The advantage of this method, is that subsetting doesn't have to be just equality but any logical expression.

CHP
  • 16,981
  • 4
  • 38
  • 57
1

If I understand correctly, I think you want to use ifelse and %in% to perform this task:

data$new1 <- ifelse(data$var1 %in% c("A","B"),data$var2,NA)

What you have performed is subsetting, which will often return something smaller, hence the warnings you are getting.

James
  • 65,548
  • 14
  • 155
  • 193
  • where will data$new1 be initially populated from? – CHP Mar 11 '13 at 16:57
  • Hi. Thanks for advice. I am getting the error `unexpected 'in' in data$new1 <- ifelse(data$var1 %in%`. – user27008 Mar 11 '13 at 17:10
  • @user27008 I'm not sure why that is happening. Can you post a reproducible example? – James Mar 11 '13 at 17:22
  • I got your code to work for the example. My data is just troublesome I guess. But I am losing the dates and just getting numbers as values for new1. – user27008 Mar 11 '13 at 17:45