How can I code this indicator matrix without using a for loop in R

Question

I have a vector of factors given by a sequence of numbers. These factors are also found in separate data seta, called test_set and train_set. What the following code does is find where the factor in the data sets matches in the vector of factors and puts a 1 in the place of the matrix. Multiplying this matrix compound_test by test_set$Compound should give you compare_comp.

compare_comp <- rbind(dcm,cmp1)[,1]
compound_test <- matrix(0,nrow(test_set),length(compare_comp)) # test indicator matrix
compound_train <-matrix(0,nrow(train_set),length(compare_comp))

for (i in 1:length(compare_comp)){
  compound_test[which(compare_comp[i]==test_set$Compound),i]=1
  compound_train[which(compare_comp[i]==train_set$Compound),i]=1}

It does this for a train and test set, and compare_comp is the vector of factors.

Is there a function in R that lets me create the same thing without the need for a for loop? I have tried model.matrix(~Compound,data=test_set) without much luck.

I'm having a hard time understanding what you're trying to do from the words and code snippet you've shared. Can you make this into a reproducible example? — ulfelder, Dec 19 '19 at 11:15
Usually, human languages are not too precise. Please provide us [reproducible input](https://stackoverflow.com/q/5963269/1422451), specifically, samples of `dcm`, `cmp1`, `test_set`, `train_set`. Then show us with data the expected output. — Parfait, Dec 19 '19 at 15:29
I have amended question with reproducable examples here https://stackoverflow.com/questions/59413766/how-can-i-code-this-indicator-matrix-without-using-a-loop-in-r — Expectation mean first moment, Dec 19 '19 at 16:40

score 0 · Answer 1 · answered Dec 19 '19 at 17:03

While you may not be able to completely avoid iteration since you are comparing each element of compare_comp vector to the full vector of Compound in each test_set and train_set, you can however use more compact assignment with apply family functions.

Specifically, sapply returns a logical matrix of booleans (TRUE, FALSE) that we assign in corresponding position to initialized matrices where TRUE converts to 1 and FALSE to 0.

# SAPPLY AFTER MATRIX INITIALIZATION
compound_test2 <- matrix(0, nrow(test_set), length(compare_comp)) 
compound_train2 <- matrix(0, nrow(train_set), length(compare_comp))

compound_test2[] <- sapply(compare_comp, function(x) x == test_set$Compound)
compound_train2[] <- sapply(compare_comp, function(x) x == train_set$Compound)

Alternatively, the rarely used and known vapply (similar to sapply but must define the output type), returns an equivalent matrix but as numeric type.

# VAPPLY WITHOUT MATRIX INITIALIZATION
compound_test3 <- vapply(compare_comp, function(x) x == test_set$Compound, 
                         numeric(length(compare_comp)))

compound_train3 <- vapply(compare_comp, function(x) x == train_set$Compound,
                          numeric(length(compare_comp)))

Testing confirms with random data (see demo below), both versions are identical to your looped version

identical(compound_test1, compound_test2)
identical(compound_train1, compound_train2)         
# [1] TRUE
# [1] TRUE

identical(compound_test1, compound_test3)
identical(compound_train1, compound_train3)     
# [1] TRUE
# [1] TRUE

Online Demo

How can I code this indicator matrix without using a for loop in R

1 Answers1

Linked

Related