Creating factor variables from levels of other factor variables with if statement

Question

I need to produce some new factor variables in my dataset which contain information from existing factor variables.

In the first case I need to produce a binary NewVariable based on whether certain values occur in a specific variable which has more than 100 levels. I use the revalue() from the plyr package Namely,

NewVar <- if(OldVar1=="helen" | OldVar1=="greg") 
             {NewVar <-revalue(OldVar1, c("helen"="participant", "greg"="participant"))}
          else {NewVar=="nonparticipant"}

I actually want to collapse specific levels into a specific level from the new variable. As you can imagine the above code does not work but I cannot figure out why.

In the second case I need to combine information from three existing factor variables (OldVar1, OldVar2, OldVar3) in order to fill in the levels of a multi-categorical NewVariable, I run this code,

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")

I get an error "Error: unexpected '=' in "OldVar=" the same occurs when I remove one of the = in the OldVar1=="a"

Is it possible to create a factor NewVariable with its levels and labels without filling them with the string values in advance? I was not able to find something on that, the tutorials I see have produced their data and they just have to label the existing values.

Also, I would like to give values to the rest of my cases who either belong to OptionA, OptionB, OptionC, etc, will this be possible setting a different if-statement for each one of them as the following?

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
NewVariable="OptionB" <- if(OldVar1=="a" & OldVar2=="d" & OldVar3=="e")

=== EDIT ===

For the second "challenge" I followed the code suggested by DWin I produced an interaction of my three variables that I have in the if(...) above and set inside c() only the values that I needed, for example

OldVar.ALL.interactions <- with(data, interaction(OldVar1, OldVar2, OldVar3)
levels(OldVar.ALL.interactions) # search for the levels that we need to include 
# in the NewVar
# below I follow DWin's code
NewVar <- factor(rep(NA, length(AnotherVarOfTheDataset) ),
                     levels=c("OptionA", "OptionB", ...))
NewVar[OldVar.ALL.interactions %in% c("...interaction.of.Old.Variables...")] <- "OptionA"
# the same as in OptionA for the rest of the levels
# the ** NewVar[ is.na(NewVar) ]  <- "nonparticipant" ** of DWin's code is not needed

Is there any other way to solve this issue without using the interaction between the Old factor variables?

You cannot collapse levels easily anymore by manipulating the levels attribute. I started to do something like `levels(NewVar) <- gsub("greg|helen" ...) ` and realized that would fail. You also cannot use: `else {NewVar=="nonparticipant"}` if you wanted to do an assignment. Then there is the whole problem that `if` and `else` are not vectorized. — IRTFM, May 29 '13 at 22:40
It does appear that `plyr::revalue` will let you collapse levels, so it is probably the incorrect use of `if` and `else` instead of `ifelse` that is part of what is tripping you up. There is also no "all.others" = "other_level" argument for `revalue`. — IRTFM, May 29 '13 at 22:54
By "not vectorized" you mean that they will not run all the length of the vector of the dataset? Is this why noah's suggestion includes an argument length.out=10 ? — Pulse, May 29 '13 at 23:11
Right. `if` and `else` take arguments of exactly length 1. They are program control functions and do not operate as persons might expect when they are prior users of SAS or SPSS where the data steps all have implicit column actions. — IRTFM, May 29 '13 at 23:16
Your explanation of "not vectorized" is correct, but that doesn't have anything to do with `length.out`; that's just setting up an example data set (since you didn't provide one) with length 10. — Aaron left Stack Overflow, May 30 '13 at 00:40
For the second part of the question, depending how many options you have, you could just use subsetting on each of your boolean conditions, or, if there are a lot, you might create a second data set and merge it with your data. It would help a lot, both in understanding what you're currently doing, and in suggesting something else, if you provided a [reproducible example](http://stackoverflow.com/q/5963269/210673). And by the way, welcome to Stack Overflow! (Love your icon, by the way...) — Aaron left Stack Overflow, May 30 '13 at 00:53

IRTFM · Accepted Answer · 2013-05-30T05:40:23.920

2

I'd probably start out with an empty factor variable (assuming that you wanted to have a factor as was implied by the subject line):

NewVar <- factor(rep(NA, length(OldVar) ), 
                 levels=c("participant", "nonparticipant") )   
NewVar[ OldVar %in% c("a", "b", "c")] <- "participant"
NewVar[ is.na(NewVar) ]             <- "nonparticipant"

If you don't mind having a character vector than somethingalong these lines:

 y <- vector("character",length(x))
 y[ x %in% c("a","c")] <- "p"
 y[ !x %in% c("a","c")] <- "np"
 y
#[1] "p" "np"  "p"

edited May 30 '13 at 05:40

answered May 29 '13 at 22:44

IRTFM

258,963
21
364
487

It worked! only I had to change the levels into levels=c("","") eitherwise the levels(NewVar) would give me only the second level and many NA since the first line (<- "participant") gave me "Warning message: ... invalid factor level, NA generated" – Pulse May 29 '13 at 23:06
Assuming you mean `levels=c("participant", "nonparticipant")`, that would be correct; @DWin's got a minor bug there, which I'm sure he'll fix as soon as he sees these messages. You'll get better answers if you can provide a small reproducible example (like what noah made for you); this will allow others to test possible solutions. – Aaron left Stack Overflow May 30 '13 at 00:45

Creating factor variables from levels of other factor variables with if statement

1 Answers1