1

I dont know any other good way to express the title.

I have one hot encoded a given matrix. Example is here

> set.seed(4)
> t <- matrix(floor(runif(10, 1,9)),5,5)

      [,1] [,2] [,3] [,4] [,5]
[1,]    5    3    5    3    5
[2,]    1    6    1    6    1
[3,]    3    8    3    8    3
[4,]    3    8    3    8    3
[5,]    7    1    7    1    7
> class(t)
[1] "matrix"

      1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
[1,]   0   0   1   0   0   1   0   0   0   0   1   0   0   1   0   0   0   0   1   0
[2,]   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0   0
[3,]   0   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0
[4,]   0   1   0   0   0   0   0   1   0   1   0   0   0   0   0   1   0   1   0   0
[5,]   0   0   0   1   1   0   0   0   0   0   0   1   1   0   0   0   0   0   0   1

I have been strugling to transform another matrix for example as below to transform into expected form.

     [,1] [,2] [,3] [,4] [,5]
[1,]    7    4    8    1    3
[2,]    3    7    4    8    1
[3,]    1    3    7    4    8
[4,]    8    1    3    7    4

Expecting the following, where the column remains as the previous encoded matrix, but the columns need to be filled with 0s and 1s according the vlaues in the new given matrix.

      1_1 1_3 1_5  1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
[1,]   0   0   0    1   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0
[2,]   0   1   0    0   0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0 
[3,]   1   0   0    0   0   1   0   0   0   0   0   1   0   0   0   0   0   0   0   0
[4,]   0   0   0    0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0

Since the column names are different to the values in new given matrix, i do not know how to check new given column values to old column values.

Any suggestion or hint would help a big time. I struggled with this the whole weekend.

sveer
  • 427
  • 3
  • 16
  • How are you determining the values i.e. `1_5` as column name. The value is not found in the second matrix – akrun Feb 23 '20 at 21:19
  • @akrun the column name was given based on the first matrix. `1_5` means `1st` column and the `5` is one of the unique values in that column. The second given matrix must not have the values that the first matrix has, that is the reason why the `1_5` column has all zeros in the new expected matrix. – sveer Feb 23 '20 at 21:26
  • In the answer https://stackoverflow.com/a/60264578/12158757, I guess you can add one more line `t <- unique(t)` before running the nested `for` loops to get the expected result – ThomasIsCoding Feb 23 '20 at 21:36

1 Answers1

1

Here, we split the 't' and 'oldt' columnwise with asplit, specifying the MARGIN as 2, then use Map to pass the corresponding list elements of the split dataset along with sequence of columns (seq_len(ncol(t))). Inside the Map, create anonymous function call (function(x, y, z) - x, y, z, representing the column values of 't', 'oldt' and the column index), create a sorted unique vector from the 'oldt' column value ('y1'), a matrix of 0's to store the output ('m1'), then sort the values that are common in both 't' and 'oldt' column ('v1'), get the row position of that value in the 't' column ('i1'), as well as the column position by matching the column names with the pasteed sequence ('z') and 'v1', using the row/column index, replace those positions in the matrix ('m1') with 1.

do.call(cbind, Map(function(x, y, z) {
    y1 <- sort(unique(y))
    m1 <- matrix(0, length(x), length(y1), dimnames = list(NULL, paste(z, y1, sep="_")))
    v1 <- sort(intersect(x, y))
    i1 <- match(v1, x)
    j1 <- match(paste(z, v1, sep="_"), colnames(m1))
    replace(m1, cbind(i1, j1), 1) }, asplit(t, 2), asplit(oldt, 2), seq_len(ncol(t))))
#     1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
#[1,]   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0
#[2,]   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   0
#[3,]   1   0   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   0   0   0
#[4,]   0   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0

data

oldt <- structure(c(5, 1, 3, 3, 7, 3, 6, 8, 8, 1, 5, 1, 3, 3, 7, 3, 6, 
8, 8, 1, 5, 1, 3, 3, 7), .Dim = c(5L, 5L))

t <- structure(c(7, 3, 1, 8, 4, 7, 3, 1, 8, 4, 7, 3, 1, 8, 4, 7, 3, 
1, 8, 4), .Dim = 4:5)
akrun
  • 874,273
  • 37
  • 540
  • 662