0

How could I identify a column in R dataframe using a variable? In the following code, I used paste0 to identify a columns with variable. Is there any alternative?

if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='Yes'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] > 0) & (leadsnp4[[paste0('Z_in_',trait2)]] < 0))
{leadsnp4$ConcordEffect='No'} else if ((leadsnp4[[paste0('Z_in_',trait1)]] < 0) & (leadsnp4[[paste0('Z_in_',trait2)]] > 0))
{leadsnp4$ConcordEffect='No'}

leadsnp4 is a dataframe. trait1 and trait2 are user defined variables. The above code is giving me warning : The condition has length > 1 and only the first element will be used. Also not getting the expected output. Not sure what is wrong here. Maybe there are other alternatives for the above if else statements. Any help?

zillur rahman
  • 355
  • 1
  • 13
  • 4
    what are `trait1` and `traint2`? can you post some data using `dput`? – AndS. Aug 06 '21 at 22:01
  • 1
    This question is related to this question here: https://stackoverflow.com/questions/14170778/interpreting-condition-has-length-1-warning-from-if-function – Pearl Aug 06 '21 at 22:23
  • It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Aug 07 '21 at 06:40

2 Answers2

1

Here is an explanation for why pasting will not work for creating a column reference and one suggestion for what you can do instead: Dynamically select data frame columns using $ and a character value

Pearl
  • 123
  • 6
1

The way you're selecting columns in fine. Using df[[col_name]] (list context) is the same as df[, col_name] -- each returns a vector copy of column col_name. You can save the column name as a variable instead of using paste0 directly in the selection.

The reason you're getting an error is that if is not vectorized and you're giving it a vector with length > 1. In this case, if uses only the first value in the vector, but warns that it's doing so. ifelse is the vectorized version in base R (there's also dplyr::if_else). If I understand your code, the below should be close to what you're looking for.

t1 <- paste0('Z_in_', trait1)
t2 <- paste0('Z_in_', trait2)

# a single boolean vector indicating if trait1 and trait2 are 
# both positive or both negative
same_sign <- ((leadsnp4[, t1] > 0) & (leadsnp4[, t2] > 0)) | 
  ((leadsnp4[, t1] < 0) & (leadsnp4[, t2] < 0))

leadsnp4$ConcordEffect <- ifelse(same_sign, "Yes", "No")

Note that if trait1 and/or trai2 are equal to 0 they will be assigned false. You'll need to modify the logic if this is not the desired behavior.

ngwalton
  • 383
  • 3
  • 8