0

The other solution marked as duplicate gave me an error when I tried it on my dataset which has categorical data as well.

I have a table with several columns. One column, column A, has 0, 1, 2, 3, 4 as values. These are codes for a certain condition. I'm trying to create/add another column, column Z, to the table which has 0 if the value in column A is 0 and 1 if the value in column A is 3 or 4. I'm trying to do it via this:

for (i in 1:nrow(pheno_table))
    if pheno_table$columnA == 0
     then pheno_table$newcolumnZ<-0
    elsif pheno_table$columnA == 3 | pheno_table$columnA == 4
     then pheno_table$newcolumnZ<-0

thanks so much @see24! also, I did try this and set the working directory and such but am not able to see the file in the folder (I checked the paths)

    setwd('/pathtofolder/') 

    library(dplyr) df <- data.frame(A=  
    (originaltablefile$column_of_interest)) 
    newcolumn <- df %>% mutate
    (newcolumn = case_when(A == 0 ~ 0, A %in% c(3,4) ~ 1, 
    TRUE ~ NA_real_)) 
    finaltablefile <- cbind(originaltablefile,newcolumn)` 

not able to see finaltablefile in my folder.

analog_kid
  • 13
  • 3
  • Possible duplicate of [Combine mutate with conditional values](https://stackoverflow.com/questions/22337394/combine-mutate-with-conditional-values) – see24 Jul 23 '18 at 16:48

1 Answers1

1

I like to use the mutate and case_when functions from the dplyr package

library(dplyr)
df <- data.frame(A = c(1,2,3,4,0),B = c(3,4,5,6,7))
df2 <- df %>% mutate(Z = case_when(A == 0 ~ 0,
                            A %in% c(3,4) ~ 1,
                            TRUE ~ NA_real_))

I'm assuming that you want NA for rows that are not 1, 3, or 4. The TRUE part means if none of the above are true then... You have to use NA_real_ because case_when requires all the outputs to be of the same type

see24
  • 1,097
  • 10
  • 21
  • Alternately, could do a left join with `mdf = data.frame(A = c(0,3,4), Z = c(0, 1, 1)); df %>% left_join(mdf)` if all comparisons are with equality. – Frank Jul 16 '18 at 17:25
  • thanks so much! im not too familiar with dplyr; could you please explain NA_real a bit more? also could you suggest a good resource for learning dplyr? thank you! – analog_kid Jul 16 '18 at 21:06
  • To learn more about dplyr I suggest this excellent free online book [R for Data Science](http://r4ds.had.co.nz/index.html). NA is of type logical and `case_when` requires that all the outputs are the same type. So if your other outputs are numbers you need NA_real_ and if they are character you need NA_character_. See the `case_when` [documentation](https://www.rdocumentation.org/packages/dplyr/versions/0.7.6/topics/case_when) examples for more – see24 Jul 17 '18 at 12:37
  • thanks so much @see24! also, I did try this and set the working directory and such but am not able to see the file in the folder (I checked the paths) ` setwd('/pathtofolder/') library(dplyr) df <- data.frame(A = (originaltablefile$column_of_interest)) newcolumn <- df %>% mutate(newcolumn = case_when(A == 0 ~ 0, A %in% c(3,4) ~ 1, TRUE ~ NA_real_)) finaltablefile <- cbind(originaltablefile,newcolumn)` not able to see finaltablefile in my folder. – analog_kid Jul 17 '18 at 15:38
  • You have created finaltablefile in the R environment if you want to save it to a folder you need to do that explicitly. You can use `write.csv` to save it as a csv file or you could use `save(finaltablefile, file = "finaltablefile.RData")` to save it as an R object that you can load in the future with `load()`. If you type `ls()` you will see the objects that you have created in the R environment. I recommend getting RStudio as it makes keeping track of the objects in the environment much easier – see24 Jul 17 '18 at 16:12
  • @analog_kid if this answer worked for you it would be much appreciated if you accepted it by clicking the check mark to the left of the answer. – see24 Jul 18 '18 at 16:48