0

I'm not an advanced R user but normally soon or later I find the help I need. Well, not this time. I have a data frame called "df" and I'm trying to create an extra column "Sel" where to store information based on other columns info. To do that I used a nested ifelse function, below is the code I used, it works for the first two conditions but not for the second two conditions where I use the AND operator. I don't see any difference where comparing the usage to other examples, and I don't get errors, only the statement relative to that condition is not pasted/printed. (I've also tried &&). What am I doing wrong? Thanks in advance for any help!

df <- data.frame(
  Gene = c("A","B","C","D","E"),
  P_a = c(NA, NA, 21010, 14941,12),
  E_a = c(NA, NA, "miss_b", "miss_b",NA),
  P_b = c(1,200,32,NA,21),
  E_b = c(NA, NA, "miss_a", NA,"miss_a"),
  Eq = c("no", "yes", NA, NA,NA )
  )
df$Sel <- ifelse(
  (df$Eq == "no"), "same",
  ifelse((df$Eq == "yes"), "diff",
         ifelse (df$E_a == "miss_b" & 
                 df$E_b == "miss_a", "G_P",
                    ifelse(is.na(df$P_b & df$E_b &
                                df$Eq),"in","out"
                                                      ))))

This is the result_df that I would expect to generate with my code

df_result <- data.frame(
  Gene = c("A","B","C","D","E"),
  P_a = c(NA, NA, 21010, 14941,12),
  E_a = c(NA, NA, "miss_b", "miss_b",NA),
  P_b = c(1,200,32,NA,21),
  E_b = c(NA, NA, "miss_a", NA,"miss_a"),
  Eq = c("no", "yes", NA, NA, NA ),
  Sel = c("same","diff", "G_P","in", "out")
)
Sedlin
  • 53
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please don't post pictures of data because we cannot copy/paste those values for testing. – MrFlick Sep 23 '20 at 19:36
  • I know you are completely right. I will implement it. – Sedlin Sep 23 '20 at 20:03

1 Answers1

0

Here is your modified code.

df$Sel <- ifelse(
  (df$Eq %in% "no"), "same",
  ifelse((df$Eq %in% "yes"), "diff",
         ifelse (df$E_a %in% "miss_b" & 
                   df$E_b %in% "miss_a", "G_P",
                 ifelse(is.na(df$P_b) & is.na(df$E_b) &
                                is.na(df$Eq),"in","out"
                 ))))


df$Sel 
[1] "same" "diff" "G_P"  "in"   "out" 

The first problem was the use of ==. As long as you have no NA in your data == works perfectly. But if you use NA with == you will get an NA as result - not a desired FALSE or TRUE. And ifelse needs a logical (FALSE, TRUE) otherise you will just get NA as result. That happend in your third row, like you mentioned.

You can try this as an example ifelse(3 == NA, 1, 2). You might expect 2 as output, because 2 is not NA. But you get NA as output.

Instead of == use %in%.

The second problem was in is.na(df$P_b & df$E_b & df$Eq). You can only put one variable at a time in is.na(). Therefore every variable needs an own is.nafunction - ifelse(is.na(df$P_b) & is.na(df$E_b) & is.na(df$Eq),"in","out").

tamtam
  • 3,541
  • 1
  • 7
  • 21
  • This is perfect! Thank you for the modifications in the code and for the exhaustive explanation. – Sedlin Sep 23 '20 at 22:32