-2

I'm trying to create a variable (df$check6) that has a 1 if ANY of the following are true using the following code:

df$check  <- ifelse(df$var1 == 1 & df$var2 == 1, 1,0)
df$check2 <- ifelse(df$var3 == 1 & df$var4 == 1, 1,0) 
df$check3 <- ifelse(df$var5 == 1 & df$var6 == 1, 1, 0) 
df$check4 <- ifelse(df$var7 >=4 & df$var8 == 1, 1,0) 
df$check5 <- ifelse(df$var9 >=4 & df$var10 == 1, 1,0)

df$check6 <- ifelse(df$check== 1 | df$check2 == 1 | df$check3 == 1 | df$check4 == 1, df$check5 == 1, 1,0)

When I run the code, my df$var7 and df$var9 are all changed to 1 when they were originally a "." in my dataset. My >=4 condition does also not appear to be working. My df$check6 = 1 when those 2 variables are numeric values of "2", when the condition is that they should be equal to or greater than 4.

I know there must be a simpler way to do this but I just tried to use the basics. Any suggestions would be appreciated!

EDIT: var 1, 3, 5 were stored as either 1, 0 or "." I created a subset that only included values that were == 1 (excluding 0 and "." cases)

Converted var 1-6,8,10 to logical, as suggested 7,9 were numeric

Answer provided then worked perfectly on my dataset.

Prccrt
  • 29
  • 1
  • 4
  • 2
    Could you please provide a data frame `df` that demonstrates why the provided code is not working for you? You can read more about how to provide a reproducible example at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – josliber Oct 20 '15 at 22:54
  • 2
    If something is == "." then it's probably factor. Factors do not respond to tests regarding order. – IRTFM Oct 20 '15 at 23:28

1 Answers1

2

I find the syntax for this much easier to comprehend in data.table (in addition to the other advantages provided by data.table). It will also be easier if the variables you're comparing to 1 are stored as logical (as it seems they should be):

library(data.table)
#convert df to a 'data.table' by reference
setDT(df)

df[ , check6 := (var1 & var2) | (var3 & var4) |
      (var5 & var6) | (var7 >= 4 & var8) | (var9 >=4 & var10)]

If vars 1,2,3,4,5,6,8,10 are not already stored as logical, and they take values outside 0,1, you can quickly convert them all to logical with:

lgkls <- paste0("var", c(1:6, 8 , 10))
df[ , (lgkls) := lapply(.SD, function(x) x == 1), .SDcols = lgkls]

If you really need the intermediate check variables, you could do:

df[ , check1 := var1 & var2]
df[ , check2 := var3 & var4]
df[ , check3 := var5 & var6]
df[ , check4 := var7 >= 4 & var8]
df[ , check5 := var9 >= 4 & var10]
df[ , check6 := check1 | check2 | check3 | check4 | check5]

And of course if you really need check6 to be stored as an integer (doubtful), you can add the "cheater"'s converter:

df[ , check6 := +(check1 | check2 | check3 | check4 | check5)]

See here for more about data.table & here for why it's good practice to avoid ifelse as much as possible.

Community
  • 1
  • 1
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • 1
    Are var1-6, 8, and 10 exclusively 1 or 0 though? Because it seems like they are looking for values exactly equal to 1, not non-zero values. – bramtayl Oct 20 '15 at 23:09
  • @bramtayl I got the feeling they were probably stored as 1,0, but of course it's easy to convert them to `logical`s; see edit – MichaelChirico Oct 20 '15 at 23:11
  • @bramtayl var 2,4,6,10 are exclusively 1 or 0. Var 1,3,5 can be 1, 0 or "." – Prccrt Oct 21 '15 at 02:50
  • 1
    If the dots came in from reading a csv, you might want to add an extra argument: `read.csv(na.strings = ".")` This will read in your variables as numeric which will be useful – bramtayl Oct 21 '15 at 03:32
  • @MichaelChirico I converted var 1-6, 8,10 as logical using as.logical( ) and 7, 9 are numeric. Running the above code^ I get "NA" for check2 when var3 & var 4 are = 1, or when var4 = 1 and var3 = ".", or when var4 = 1 and var 3 = 0. Could this be because my var 1,3,5 can be stored as '1, 0 or "."' ? – Prccrt Oct 21 '15 at 18:24
  • @Prccrt I can't know that without a reproducible example, but I feel like BondedDust's comment is probably accurate. – MichaelChirico Oct 21 '15 at 20:05
  • @MichaelChirico Thanks for all the help. Sorry I couldn't make a reproducible sample for you (still working on figuring that out so I can make it easier to get help in the future). I figured out something that seems to work with my dataset and applied your code and it worked! Going to update my question above. – Prccrt Oct 22 '15 at 02:56