Updated: With apologies to those who replied, in my original example I overlooked the fact that data.frame()
created var
as a factor rather than as a character vector, as I had intended. I have corrected the example, and this will break at least one of the answers.
--original--
I have a data frame that I'm performing a series of dplyr and tidyr manipulations on, and I would like to add columns for indicator variables that would be encoded as 0 or 1, and do this within the dplyr chain. Each level of a factor (presently stored as character vectors) should be encoded in a separate column, and the column names are a concatenation of a fixed prefix with the variable level, e.g. var
has level a, new column var_a
will be 1, and all other rows of var_a
will be 0.
The following minimal example using base R produces exactly the results that I want (thanks to this blog post), but I'd like to roll it all into the dplyr chain, and can't quite figure out how to do it.
library(dplyr)
df <- data.frame(var = sample(x = letters[1:4], size = 10, replace = TRUE), stringsAsFactors = FALSE)
for(level in unique(df$var)){
df[paste("var", level, sep = "_")] <- ifelse(df$var == level, 1, 0)
}
Note that the real data set contains multiple columns, none of which should be altered or dropped when creating the indicator variables, with the exception of the column var
, which could be converted to type factor.