2

I'm pretty new to R and not sure how to find variables based on its' values and then turn them into numeric.

I have looked at How do I change a value coded as "Yes" to a value of 1 in R? and Convert data.frame column format from character to factor.

These are my examples. I'm basically converting character variables that have only 'N' and 'Y' to 0 and 1, respectively. After going through some of the variables individually, I was wondering if there's a faster way to solve this problem. There are obviously other character variables that do not have "Y"/"N" so I don't want to just find all character variables and convert them into numeric. Please let me know if you have any ideas!

My codes:

df$var3<- ifelse(df$var3=="Y",1,0)
df$var4<- ifelse(df$var4=="Y",1,0)
df$var6<- ifelse(df$var5=="Y",1,0)
df$var7<- ifelse(df$var1=="Y",1,0)

sample df (pre):

n = c(2, 3, 5, 8, 10) 
var1 = c("aa", "bb", "cc", "dd", "ee") 
var2 = c(TRUE, FALSE, TRUE, TRUE, TRUE) 
var3 = c("Y", "N", "Y", NA, "N") 
var4 = c("Y", "N", "Y", NA, "Y") 
var5 = c("aa", "bb", "cc", "dd", "ee") 
var6 = c("Y", "N", "Y", "Y", "N") 
var7 = c("Y", "N", "Y", "N", "N") 
df = data.frame(n, var1, var2, var3,var4,var5,var6,var7) 
df <- data.frame(lapply(df, as.character), stringsAsFactors = FALSE)

sample df (post, what I want):

n = c(2, 3, 5, 8, 10) 
var1 = c("aa", "bb", "cc", "dd", "ee") 
var2 = c(TRUE, FALSE, TRUE, TRUE, TRUE) 
var3 = c("1", "0", "1", NA, "0") 
var4 = c("1", "0", "1", NA, "1") 
var5 = c("aa", "bb", "cc", "dd", "ee") 
var6 = c("1", "0", "1", "1", "0") 
var7 = c("1", "0", "1", "0", "0") 
df = data.frame(n, var1, var2, var3,var4,var5,var6,var7) 
Sun
  • 157
  • 11

2 Answers2

4

An easiest option is (if we know the index of the columns) to subset the columns of interest, convert it to logical matrix (==), coerce it to binary (+), and assign it back to the columns of interest

i1 <- c(4, 5, 7, 8)
df[i1] <- +(df[i1] == "Y")

If we don't have the index and have to individually check each column, then loop through the columns, check whether it is factor and have only the levels 'N', 'Y', then convert it to logical vector and change it to integer with as.integer

df[] <- lapply(df, function(x) if(is.factor(x) && all(levels(x) %in% c("Y", "N"))) 
                  as.integer(x == "Y") else x)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • The original data variables are in chr. Do you suggest first convert them into factors or should I try "if(is.character(x) && all(levels(x) %in% c(... "? – Sun Jan 02 '19 at 20:52
  • 1
    @Sun In that it case it would be `is.character(x) && all(unique(x) %in% c("N", "Y"))` – akrun Jan 02 '19 at 20:54
  • 1
    @akrun `dplyr::mutate_if` would be also useful, I suppose. It's basically the same answer just easier to comprehend/write but not necessarily faster. Cheers. – M-- Jan 02 '19 at 20:57
  • 1
    @M-M That should also work `df %>% mutate_if(~ is.character(.x) && all(unique(na.omit(.x)) %in% c("N", "Y")), funs(as.integer(.== "Y")))` – akrun Jan 02 '19 at 21:04
  • 1
    @Sun I would also include `na.omit` to take care of the NA elements `is.character(x) && all(unique(na.omit(x)) %in% c("N", "Y"))` – akrun Jan 02 '19 at 21:04
  • 1
    @akrun Thank you! df[] <- lapply(df, function(x) if(is.character(x) && all(unique(na.omit(x)) %in% c("Y", "N"))) as.integer(x=="Y") else x) worked. – Sun Jan 02 '19 at 21:06
  • 1
    I was convinced that `as.integer(logical)` was the fastest way. I had always compared it with `0L + (logical)`. As it turns out, your way, with no `0L` is the fastest. – Rui Barradas Jan 02 '19 at 22:28
0

Following solution works with R base and without need of loading additional R packages:

If you are looking to make changes throughout your complete dataframe, you can use below lines. The backdraw of using [elseif] in this specific scenario is that you are forced to set the [else] value, risking that some data might be overwritten.

df[df == 'N'] <- 0
df[df == 'Y'] <- 1
Toolbox
  • 2,333
  • 12
  • 26