0

I am new in R. My dataset consist categorical variable as "importance" having three categories as "High","Medium","Low" and total observations are 1000 and out this 150 are 'NA'. Now, I want to label encoding of above mentioned variable as "High"=0, "Medium"=1, "Low"=2 and also want to encode 'NA'=3. What I've done so far as:-

Data$importance=as.numeric(Data$importance)

but this is failed to encode "NA". In python we have library as labelEncoder. So is there any package available in R like this? If not then what is the most specific way to do this for multi categorical variables?

Bits
  • 179
  • 1
  • 2
  • 7

2 Answers2

0
df  = data.frame(label=c("Low","High","Medium",NA,"High"))
df$importance = match(df$label, c("High", "Medium", "Low", NA)) - 1
df
#     label importance
# 1    Low          2
# 2   High          0
# 3 Medium          1
# 4   <NA>          3
# 5   High          0
Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
0

You can do encoding in following way as well:

y=data.frame("importance"=c("high","low","medium","NA"),stringsAsFactors =   FALSE)
y$importance <- replace(y$importance, y$importance == "high", 0)
y$importance <- replace(y$importance, y$importance == "medium", 1)
y$importance <- replace(y$importance, y$importance == "low", 2)
y$importance <- replace(y$importance, y$importance == "NA", 3)
Harshit Mehta
  • 328
  • 1
  • 2
  • 11
  • Does this also works for categorical variables having DataType as 'factor'? – Bits Jul 31 '17 at 19:24
  • for the categorical variables having dataType as "factor", you can force them to character using stringAsFactors = FALSE, otherwise replace will generate a warning and won't give desired results. You can change the type of variables with factor type to character using as.character – Harshit Mehta Jul 31 '17 at 19:32