I am trying to do one-hot-encoding of the below character dataframe in R.
x1 <- c('')
x2 <- c('A1,A2')
x3 <- c('A2,A3,A4')
test <- as.data.frame(rbind(x1,x2,x3))
I am trying to bring the data to the format:
x1 <- c(0,0,0,0)
x2 <- c(1,1,0,0)
x3 <- c(0,1,1,1)
result <- as.data.frame(rbind(x1,x2,x3))
names(result) = c('A1','A2','A3','A4')
The delimiter that is used is comma and I can split on the comma using:
test$V1 = as.character(test$V1)
split_list = strsplit(test$V1, ",")
This gives me a list of lists which cannot be coerced directly into a dataframe. Is there a better way of doing this. I was trying out "https://www.rdocumentation.org/packages/CatEncoders/versions/0.1.0/topics/OneHotEncoder.fit". The package was spreading a single column rather than multiple columns as needed in this case.