A selected answer to a question here:
creating a factor variable with dplyr?
Did not impress Hadley and the follow-up answer does not generalise well for some of the problems I've come across. I'm wondering if the community can do something better with a simpler example:
### DATA ###
A = round(runif(200,0,1),0)
B = c(1 - A[1:100],rep(0,100))
C = c(rep(0,100), 1 - A[101:200])
dummies <- as.data.frame(cbind(A,B,C))
header <- c("Christian", "Muslim", "Athiest")
names(dummies) <- header
### ONE WAY ###
dummies$Religion <- factor(ifelse(dummies$Christian==1, "Christian",
ifelse(dummies$Muslim==1, "Muslim",
ifelse(dummies$Athiest==1, "Athiest", NA))))
Solution mimics the result provided to the OP in the link above. Is there a simpler function to collapse the dummy variables to one factor variable, like say the egen group function in STATA?? Simple one liner would be great.
Using Akrun's solution and system time (thank you):
set.seed(24)
A = round(runif(2e6,0,1),0)
B = c(1 - A[1:1e6],rep(0,1e6))
C = c(rep(0,1e6), 1 - A[1000001:2000000])
dummies <- as.data.frame(cbind(A,B,C))
header <- c("Christian", "Muslim", "Athiest")
names(dummies) <- header
attach(dummies)
#Alistaire
system.time({
dummies %>% rowwise() %>%
transmute(religion = names(.)[as.logical(c(Christian, Muslim, Athiest))])
})
# user system elapsed
# 56.08 0.00 56.08
system.time({
dummies %>% transmute(religion = case_when(
as.logical(Christian) ~ 'Christian',
as.logical(Muslim) ~ 'Muslim',
as.logical(Athiest) ~ 'Atheist'))
})
# user system elapsed
# 0.22 0.04 0.27
#Curt F.
system.time({
dummies %>%
gather(religion, is_valid) %>%
filter(is_valid == T) %>%
select(-is_valid)
})
# user system elapsed
# 0.33 0.03 0.36
#Akrun
system.time({
names(dummies)[as.matrix(dummies)%*% seq_along(dummies)]
})
# user system elapsed
# 0.13 0.06 0.21
system.time({
names(dummies)[max.col(dummies, "first")]
})
# user system elapsed
# 0.04 0.07 0.11
I find that Akrun's solution works out to be the fastest method and provided 2 one-liners. However, many thanks to the others for their unique approaches to the problem and generous supply of coding methods that I would like to learn more about, especially the use of %%
, names(.)
, is_valid
and the qdapTools package.