0

I have an issue of translating matrix into one hot encoding in R. I implemented in Matlab but i have difficulty in handling the object in R. Here i have an object of type 'matrix'.

Matrix having certain values

I would like to apply one hot encoding to this matrix. I have problem with column names.

here is an example:

> set.seed(4)
> t <- matrix(floor(runif(10, 1,9)),5,5)

      [,1] [,2] [,3] [,4] [,5]
[1,]    5    3    5    3    5
[2,]    1    6    1    6    1
[3,]    3    8    3    8    3
[4,]    3    8    3    8    3
[5,]    7    1    7    1    7
> class(t)
[1] "matrix"

Expecting:

      1_1 1_3 1_5 1_7  2_1 2_3 2_6 2_8 ...
[1,]   0   0   1   0    0   1   0   0  ...
[2,]   1   0   0   0    0   0   1   0  ...
[3,]   0   1   0   0    0   0   0   1  ...
[4,]   0   1   0   0    0   0   0   1  ...   
[5,]   0   0   0   1    1   0   0   0  ...

I tried the following, but the matrix remains the same.

library(data.table)
library(mltools)
test_table <- one_hot(as.data.table(t))

Any suggestions would be very much appreciated.

sveer
  • 427
  • 3
  • 16
  • Hi sveer. Please add a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Also, show what you have already tried. That way you can help others to help you! – dario Feb 17 '20 at 13:31
  • Hi Dario, thank you for your suggestion. I added an example, i have not tried much, since the examples available online have a column name and translating only one column into one hot encoding. In my case the columns are not existing and i would like to translate every column into One hot encoding. I have an issue with basic R object (matrix) handling issue than the concept of one hot encoding itself. – sveer Feb 17 '20 at 14:09

2 Answers2

3

Your data table must contain some columns (variables) that have class "factor". Try this:

> t <- data.table(t)
> t[,V1:=factor(V1)]
> one_hot(t)
   V1_1 V1_3 V1_5 V1_7 V2 V3 V4 V5
1:    0    0    1    0  3  5  3  5
2:    1    0    0    0  6  1  6  1
3:    0    1    0    0  8  3  8  3
4:    0    1    0    0  8  3  8  3
5:    0    0    0    1  1  7  1  7

But I read that from here that the dummyVars function from the caret package is quicker if your matrix is large.

Edit: Forgot to set the seed. :P

And a quick way to factor all variables in a data table:

t.f <- t[, lapply(.SD, as.factor)]
Edward
  • 10,360
  • 2
  • 11
  • 26
  • I read that page as well. Thank you for your quick response. This method did translate only first column. And factoring all the variables throwing an error, stating `"Invalid Index type ''List"` – sveer Feb 17 '20 at 14:44
  • You sure? Works for me. Just copy and paste my commands (delete the `>`). If you type it yourself, you may make a typo error, such as .sd instead of .SD Let me know ... – Edward Feb 17 '20 at 14:48
  • 1
    You may have forgotten to convert the matrix to a data.table. ;-) – Edward Feb 17 '20 at 14:50
  • yes indeed. i did not transform the matrix to data.table. My bad. thanks for pointing it out and it worked. Thanks. greetings – sveer Feb 17 '20 at 14:57
  • Although both the solutions worked, Darios solution is flexible, I accepted his solution . I hope its okay for you. – sveer Feb 17 '20 at 15:01
1

There are probably more concise ways to do this but this should work (and is at least easy to read and understand ;)

Suggested solution using base R and double loop:

set.seed(4)  
t <- matrix(floor(runif(10, 1,9)),5,5)

# initialize result object
#
t_hot <- NULL

# for each column in original matrix
#
for (col in seq_along(t[1,])) {
  # for each unique value in this column (sorted so the resulting
  # columns appear in order)
  #
  for (val in sort(unique(t[, col]))) {
    t_hot <- cbind(t_hot, ifelse(t[, col] == val, 1, 0))
    # make name for this column
    #
    colnames(t_hot)[ncol(t_hot)] <- paste0(col, "_", val)
  }
}

This returns:

     1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
[1,]   0   0   1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0   1   0
[2,]   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0   0
[3,]   0   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0
[4,]   0   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0
[5,]   0   0   0   1   1   0   0   0   0   0   0   1   1   0   0   0   0   0   0   1
dario
  • 6,415
  • 2
  • 12
  • 26