0

I am working on a project that requires me to one-hot code a single variable and I cannot seem to do it correctly.

I simply want to one-hot code the variable data$Ratings so that the values for 1,2,3 and separated in the dataframe and only equal either 0 or 1. E.g., if data$Ratings = 3 then the dummy would = 1. All the other columns are not to change.

structure(list(ID = c(284921427, 284926400, 284946595, 285755462, 
285831220, 286210009, 286313771, 286363959, 286566987, 286682679
), AUR = c(4, 3.5, 3, 3.5, 3.5, 3, 2.5, 2.5, 2.5, 2.5), URC = c(3553, 
284, 8376, 190394, 28, 47, 35, 125, 44, 184), Price = c(2.99, 
1.99, 0, 0, 2.99, 0, 0, 0.99, 0, 0), AgeRating = c(1, 1, 1, 1, 
1, 1, 1, 1, 1, 1), Size = c(15853568, 12328960, 674816, 21552128, 
34689024, 48672768, 6328320, 64333824, 2657280, 1466515), HasSubtitle = c(0, 
0, 0, 0, 0, 1, 0, 0, 0, 0), InAppSum = c(0, 0, 0, 0, 0, 1.99, 
0, 0, 0, 0), InAppMin = c(0, 0, 0, 0, 0, 1.99, 0, 0, 0, 0), InAppMax = c(0, 
0, 0, 0, 0, 1.99, 0, 0, 0, 0), InAppCount = c(0, 0, 0, 0, 0, 
1, 0, 0, 0, 0), InAppAvg = c(0, 0, 0, 0, 0, 1.99, 0, 0, 0, 0), 
descriptionTermCount = c(263, 204, 97, 272, 365, 368, 113, 
129, 61, 87), LanguagesCount = c(17, 1, 1, 17, 15, 1, 0, 
1, 1, 1), EngSupported = c(2, 2, 2, 2, 2, 2, 1, 2, 1, 2), 
GenreCount = c(2, 2, 2, 2, 3, 3, 3, 2, 3, 2), months = c(7, 
7, 7, 7, 7, 7, 7, 8, 8, 8), monthsSinceUpdate = c(29, 17, 
25, 29, 15, 6, 71, 12, 23, 134), GameFree = c(0, 0, 0, 0, 
0, 1, 0, 0, 0, 0), Ratings = c(3, 3, 3, 3, 2, 3, 2, 3, 2, 
3)), row.names = c(NA, 10L), class = "data.frame")

install.packages("mlbench")
install.packages("neuralnet")
install.packages("mltools")


library(mlbench)
library(dplyr)
library(caret)
library(mltools)
library(tidyr)
data2 <- mutate_if(data, is.factor,as.numeric) 
data3 <- lapply(data2, function(x) as.numeric(as.character(x)))
data <- data.frame(data3)
summary(data)
head(data)
str(data)
View(data)

#
dput(head(data, 10))
data %>% mutate(value = 1)  %>% spread(data$Ratings, value,  fill = 0 )
Helix123
  • 3,502
  • 2
  • 16
  • 36
Lazlow10-4
  • 13
  • 5

1 Answers1

0

Is this what you want? I will assume your data is called data and continue with that for the data frame you supplied:

library(plm)
plm::make.dummies(data$Ratings) # returns a matrix
##   2 3
## 2 1 0
## 3 0 1

# returns the full data frame with dummies added:
plm::make.dummies(data, col = "Ratings")
## [not printed to save space]

There are some options for plm::make.dummies, e.g., you can select the base category via base and you can choose whether to include the base (add.base = TRUE) or not (add.base = FALSE).

The help page ?plm::make.dummies has more examples and explanation as well as a comparison for LSDV model estimation by a factor variable and by explicitly self-created dummies.

Helix123
  • 3,502
  • 2
  • 16
  • 36