I have a dataframe with a mix of continuous and categorical data.
df<- data.frame(gender=c("male","female","transgender"),
education=c("high-school","grad-school","home-school"),
smoke=c("yes","no","prefer not tell"))
> print(df)
gender education smoke
1 male high-school yes
2 female grad-school no
3 transgender home-school prefer not tell
> str(df)
'data.frame': 3 obs. of 3 variables:
$ gender : chr "male" "female" "transgender"
$ education: chr "high-school" "grad-school" "home-school"
$ smoke : chr "yes" "no" "prefer not tell"
I'm trying to recode the categorical columns to nominal format. My current approach is significantly tedious. First, I have to convert all character variables to factor format,
# Coerce all character formats to Factors
df<- data.frame(df[sapply(df, is.character)] <-
lapply(df[sapply(df, is.character)], as.factor))
library(plyr)
df$gender<- revalue(df$gender,c("male"="1","female"="2","transgender"="3"))
df$education<- revalue(df$education,c("high-school"="1","grad-school"="2","home-school"="3"))
df$smoke<- revalue(df$smoke,c("yes"="1","no"="2","prefer not tell"="3"))
> print(df)
gender education smoke
1 1 1 1
2 2 2 2
3 3 3 3
Is there a more elegant way to approach this problem? Something along the lines of tidyverse
style will be helpful. I have already seen somewhat similar questions like 1, 2,3. The issue with these solutions are either they are not relevant to what I seek or else they using base R approaches like lapply()
or sapply()
, which is difficult for me to interpret. I would also like to know if there is an elegant approach to convert all character variables to factor format along the lines of tidyverse approach.