Is there is a clean way to detect all binary character columns in a data frame and convert them to 1s and 0s all at once. For example a column that contains only "yes" and "no" values and a column that contains just "day" and "night" values, etc. would all be converted to 1s and 0s with the same piece of code that doesn't require me to specify the words "yes" "no" "day" "night" "good" "bad", the list goes on?
Asked
Active
Viewed 70 times
-1
-
1Welcome to SO, Keo! *"the list goes on"* suggests one of two things: (1) You are asking if there is a package that already has all of these pre-defined, therefore you are asking us to *"recommend or find a book, tool, software library"*, which is [off-topic](https://stackoverflow.com/help/on-topic) on SO; and/or (2) you expect us to come up with all of the relevant combinations of binary data, and you'll judge if we do a good job. That doesn't work well on SO. Please read about reproducible question: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans May 19 '20 at 15:58
2 Answers
2
library(tidyverse)
# Create a dataset with two binary variables (x2 and x4)
sample_data <- tibble(x1 = rnorm(1000),
x2 = rbinom(1000, 1, 0.5),
x3 = rpois(1000, 5),
x4 = sample(c("yes", "no"), 1000, replace = TRUE))
# Determine which variables have two levels and save them
binary_vars <- sample_data %>%
# This line calculates how many different values are present within each variable
map_df(~ unique(.) %>% length()) %>%
# These lines just clean up the results
gather() %>%
arrange(value) %>%
filter(value == 2) %>%
# This line pulls the variable names
pull(key)
# Define a function to convert all binary variables to 1s and 0s
make_binary <- function(vct) {
vct %>%
as_factor %>%
as.numeric() %>%
`-`(1)
}
# Mutate the relevant variables
sample_data %>%
mutate_at(binary_vars,
make_binary)

Auggie Heschmeyer
- 143
- 5
1
Here's a cut that lets you define your own set of binaries.
The premise is that the first value in the list is "1", the remaining entries are all 0. In this case I've done vectors of length 2, but it's feasible you could do more.
binaries <- list(
c("yes", "no"), c("day", "night"), c("on", "off"), c("true", "false")
)
dat <- data.frame(
v1 = c("yes", NA, NA),
v2 = c("yes", "maybe", "no"),
v3 = c("true", "false", NA),
v4 = c("hello", "goodbye", NA),
stringsAsFactors = FALSE
)
possibly_binary <- function(x, binaries, na.rm = TRUE) {
if (na.rm) binaries <- lapply(binaries, c, NA)
foundsomething <- sapply(binaries, function(b) all(x %in% b))
if (any(foundsomething)) {
one <- binaries[[ which(foundsomething)[1] ]][1]
return(+(x == one))
} else return(x)
}
Here it is in action. We control what NA
does with the na.rm=
argument to the function. If it is true, then NA
is effectively added to each of the binaries
vectors, though it will be kept as NA
in the returned data.
dat
# v1 v2 v3 v4
# 1 yes yes true hello
# 2 <NA> maybe false goodbye
# 3 <NA> no <NA> <NA>
dat[] <- lapply(dat, possibly_binary, binaries = binaries)
dat
# v1 v2 v3 v4
# 1 1 yes 1 hello
# 2 NA maybe 0 goodbye
# 3 NA no NA <NA>

r2evans
- 141,215
- 6
- 77
- 149
-
How come "yes" and "no" in v2 did not become "1" and "0" and "hello" and "goodbye" did not become "1" and "0"? – Keo May 19 '20 at 16:25
-
In v2, there is a non-binary "maybe", so the column is clearly not binary. In v4, I specifically did not include it in my `binaries` list. Feel free to extend that list of vectors to include those which you believe I have omitted. The point of this answer is that it gives you the ability to control your own binary candidates. – r2evans May 19 '20 at 16:33
-
You can auto-discover possible binary candidates from your data with something like `Filter(function(a) length(na.omit(a)) == 2, sapply(dat, unique))`. – r2evans May 19 '20 at 17:16