I have a dataset where all my data is categorical and I would like to use one hot encoding for further analysis.
Main issues I would like to resolve:
- Some cells contain many text in one cell (an example will follow).
- Some numerical values need to be changed to factor for further process.
Data with 3 headings Age, info & Target
mydf <- structure(list(Age = c(99L, 10L, 40L, 15L), Info = c("c(\"good\", \"bad\", \"sad\"",
"c(\"nice\", \"happy\", \"joy\"", "NULL", "c(\"okay\", \"nice\", \"fun\", \"wild\", \"go\""
), Target = c("Boy", "Girl", "Boy", "Boy")), .Names = c("Age",
"Info", "Target"), row.names = c(NA, 4L), class = "data.frame")
I want to create one hot encoding of all these variables shown above so it will look like the following:
Age_99 Age_10 Age_40 Age_15 good bad sad nice happy joy null okay nice fun wild go Boy Girl
1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1
Some of the questions on SO I have checked are this and this.