3

I have data that looks like this:

 df <- read.table(tc <- textConnection("
     var1    var2    var3    var4
      1       1       7      NA
      4       4       NA      6
      2       NA      3       NA                
      4       4       4       4              
      1       3       1      1"), header = TRUE); close(tc)

I'm trying to create a new column that returns 1 if there's a match or 0 if none.

My non-working code looks like this:

 df$var5 = ifelse("1" %in% df$var1,1,
                ifelse("1" %in% df$var2,1,
                      ifelse("1" %in% df$var3,1,
                           ifelse("1" %in% df$var4,1,0))))

giving me a table:

 var1    var2    var3    var4   var5
      1       1       7      NA      1
      4       4       NA      6      1
      2       NA      3      NA      1         
      4       4       4       4      1        
      1       3      1        1      1

The table I actually want should look like

    var1    var2    var3    var4   var5
      1       1       7      NA      1
      4       4       NA      6      0
      2       NA      3      NA      0         
      4       4       4       4      0        
      1       3      1        1      1

I've looked at the posts:

ifelse not working as expected in R

and

Loop over rows of dataframe applying function with if-statement

but I couldn't get any answer to my problem.

Community
  • 1
  • 1
Mikee
  • 783
  • 1
  • 6
  • 18

2 Answers2

2

The correct way should be

with(df, ifelse(var1 %in% 1,1,
            ifelse(var2 %in% 1,1,
                  ifelse(var3 %in% 1,1,
                       ifelse(var4 %in% 1,1,0)))))
#[1] 1 0 0 0 1

The reason is that 1 %in% df1$var1 returns only a single element that 1.

1 %in% df$var1
#[1] TRUE

likewise, in all all the columns, there is 1, so it will return TRUE for all the ifelse, resulting in value 1.

whereas the opposite

df$var1 %in% 1
#[1]  TRUE FALSE FALSE FALSE  TRUE

returns the logical vector with the same length as the original column. In essence, by using %in%, the length returned will be based on the length of the object in the lhs of %in%


It is not required to have ifelse, a better option would be, using rowSum on the logical matrix (df ==1), and check whether it is not equal to 0, convert to binary with as.integer.

as.integer(rowSums(df == 1, na.rm =TRUE)!=0)
#[1] 1 0 0 0 1

Or another option is Reduce with |

as.integer(Reduce(`|`, lapply(replace(df, is.na(df), 0), `==`, 1)))
#[1] 1 0 0 0 1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Instead of using ifelse separately for every column you can check row wise if 1 exists in the entire row and then return 1 or 0 accordingly

as.numeric(apply(df, 1, function(x) any(x == 1)) %in% TRUE) 
#[1] 1 0 0 0 1

Just to explain the steps better:

apply(df, 1, function(x) any(x == 1))
#[1]  TRUE    NA    NA FALSE  TRUE

apply(df, 1, function(x) any(x == 1)) %in% TRUE
#[1]  TRUE FALSE FALSE FALSE  TRUE

as.numeric(apply(df, 1, function(x) any(x == 1)) %in% TRUE)
#[1] 1 0 0 0 1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213