How to detect all binary character columns (each column has different set of char. values) in a data frame and convert them to 1s and 0s all at once?

Question

Is there is a clean way to detect all binary character columns in a data frame and convert them to 1s and 0s all at once. For example a column that contains only "yes" and "no" values and a column that contains just "day" and "night" values, etc. would all be converted to 1s and 0s with the same piece of code that doesn't require me to specify the words "yes" "no" "day" "night" "good" "bad", the list goes on?

Welcome to SO, Keo! *"the list goes on"* suggests one of two things: (1) You are asking if there is a package that already has all of these pre-defined, therefore you are asking us to *"recommend or find a book, tool, software library"*, which is [off-topic](https://stackoverflow.com/help/on-topic) on SO; and/or (2) you expect us to come up with all of the relevant combinations of binary data, and you'll judge if we do a good job. That doesn't work well on SO. Please read about reproducible question: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, May 19 '20 at 15:58

Auggie Heschmeyer · Answer 1 · 2020-05-19T16:15:54.483

library(tidyverse)

# Create a dataset with two binary variables (x2 and x4)
sample_data <- tibble(x1 = rnorm(1000),
       x2 = rbinom(1000, 1, 0.5),
       x3 = rpois(1000, 5),
       x4 = sample(c("yes", "no"), 1000, replace = TRUE))

# Determine which variables have two levels and save them
binary_vars <- sample_data %>%  
  # This line calculates how many different values are present within each variable
  map_df(~ unique(.) %>% length()) %>% 
  # These lines just clean up the results
  gather() %>% 
  arrange(value) %>% 
  filter(value == 2) %>% 
  # This line pulls the variable names
  pull(key)

# Define a function to convert all binary variables to 1s and 0s
make_binary <- function(vct) {
  vct %>% 
    as_factor %>% 
    as.numeric() %>% 
    `-`(1)
}

# Mutate the relevant variables
sample_data %>%
  mutate_at(binary_vars,
            make_binary)

score 1 · Accepted Answer · answered May 19 '20 at 16:08

Here's a cut that lets you define your own set of binaries.

The premise is that the first value in the list is "1", the remaining entries are all 0. In this case I've done vectors of length 2, but it's feasible you could do more.

binaries <- list(
  c("yes", "no"), c("day", "night"), c("on", "off"), c("true", "false")
)

dat <- data.frame(
  v1 = c("yes", NA, NA),
  v2 = c("yes", "maybe", "no"),
  v3 = c("true", "false", NA),
  v4 = c("hello", "goodbye", NA),
  stringsAsFactors = FALSE
)

possibly_binary <- function(x, binaries, na.rm = TRUE) {
  if (na.rm) binaries <- lapply(binaries, c, NA)
  foundsomething <- sapply(binaries, function(b) all(x %in% b))
  if (any(foundsomething)) {
    one <- binaries[[ which(foundsomething)[1] ]][1]
    return(+(x == one))
  } else return(x)
}

Here it is in action. We control what NA does with the na.rm= argument to the function. If it is true, then NA is effectively added to each of the binaries vectors, though it will be kept as NA in the returned data.

dat
#     v1    v2    v3      v4
# 1  yes   yes  true   hello
# 2 <NA> maybe false goodbye
# 3 <NA>    no  <NA>    <NA>

dat[] <- lapply(dat, possibly_binary, binaries = binaries)
dat
#   v1    v2 v3      v4
# 1  1   yes  1   hello
# 2 NA maybe  0 goodbye
# 3 NA    no NA    <NA>

How come "yes" and "no" in v2 did not become "1" and "0" and "hello" and "goodbye" did not become "1" and "0"? — Keo, May 19 '20 at 16:25
In v2, there is a non-binary "maybe", so the column is clearly not binary. In v4, I specifically did not include it in my `binaries` list. Feel free to extend that list of vectors to include those which you believe I have omitted. The point of this answer is that it gives you the ability to control your own binary candidates. — r2evans, May 19 '20 at 16:33
You can auto-discover possible binary candidates from your data with something like `Filter(function(a) length(na.omit(a)) == 2, sapply(dat, unique))`. — r2evans, May 19 '20 at 17:16

How to detect all binary character columns (each column has different set of char. values) in a data frame and convert them to 1s and 0s all at once?

2 Answers2