0

I am trying to transform all my categorical variables into numerical ones, it seems easy with the logical variables (Yes or No), but it is harder for me with the categorical variables with more than 2 options. ¿How can I change the BusinessTravel variable which has 3 options ("Non-Travel", "Rarely_Travel", and "Frequently_Travel") to a numerical variable (0, 1, and 2)?

#---- load the dataset ----#
library(readr)
Dataset <- read_csv("Dataset.csv")

#---- select the important variables ----#
new_df <- select(df, DistanceFromHome, MonthlyIncome, YearsAtCompany, 
                 Attrition, BusinessTravel, OverTime, JobInvolvement, 
                 StockOptionLevel, EnvironmentSatisfaction, JobLevel, 
                 Department)

#---- Transform the Variables ----#
new_df <- new_df %>%
  mutate(Attrition = ifelse(Attrition == "No", 0, 1),
         OverTime = ifelse(OverTime == "No", 0, 1))
         BusinessTravel ??? 

2 Answers2

1

Try this, which converts all the variables to factors (which R assigns a number behind the scenes), then extracts the underlying number value:

Data

df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
                 MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))

Code

convert_cols <- 1:ncol(df)
df[convert_cols] <- lapply(df[convert_cols], function(x) as.numeric(as.factor(x)))

Output:

#   DistanceFromHome MonthlyIncome
# 1                1             1
# 2                2             3
# 3                3             1
# 4                2             2
# 5                1             3

Since you didnt provide sample data, I dont know if this exact code will work, but you can change convert_cols to whatever columns you want to extract the numbers from (here I just did all)

jpsmith
  • 11,023
  • 5
  • 15
  • 36
0

Data.matrix() might do the job. Borrowing the worked example from jpsmith's solution:

df <- data.frame(DistanceFromHome = c("Close", "Far", "Medium", "Far", "Close"),
                 MonthlyIncome = c("<1000", "1000-5000", "<1000", ">5000", "1000-5000"))

data.matrix(df)

Gives:

> data.matrix(df)
     DistanceFromHome MonthlyIncome
[1,]                1             1
[2,]                2             3
[3,]                3             1
[4,]                2             2
[5,]                1             3
Paul Stafford Allen
  • 1,840
  • 1
  • 5
  • 16