2

I have a dataset with values 0, 1, and 2.

data <- matrix(c(1, 0, 0, 1, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 1), nrow = 5, ncol = 4)
> data
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    2
[2,]    0    1    0    0
[3,]    0    0    0    1
[4,]    1    1    0    1
[5,]    2    0    0    1

I would like to a create matrix based on this data such that the value 0 is (0, 0), 1 is (1, 0), and 2 is (0, 1). Below is the code that I'm using:

data.exp <- matrix(NA, nrow = nrow(data)*2, ncol = ncol(data))
for(i in 1:nrow(data)){
  for(j in 1:(ncol(data))){
    if(data[i,j] == 1){
      vec <- c(1, 0)
    }else if(data[i, j] == 0){
      vec <- c(0, 0)
    }else{
      vec <- c(0, 1)
    }
    data.exp[((i*2-1):(i*2)), j] <- vec
  }
}
> data.exp
      [,1] [,2] [,3] [,4]
 [1,]    1    1    0    0
 [2,]    0    0    0    1
 [3,]    0    1    0    0
 [4,]    0    0    0    0
 [5,]    0    0    0    1
 [6,]    0    0    0    0
 [7,]    1    1    0    1
 [8,]    0    0    0    0
 [9,]    0    0    0    1
[10,]    1    0    0    0

Is there a faster way to generate the matrix, data.exp, without having to use a nested for loop in R? As the sample size increases, the nested for loop approach is not very efficient.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Adrian
  • 9,229
  • 24
  • 74
  • 132

3 Answers3

4

apply should be pretty fast for matrices. Create a list, v, with appropriate values and subset by considering 0, 1, or 2 in data as indices of v

v = list(c(0, 0), c(1, 0), c(0, 1))
apply(data, 2, function(i) do.call(cbind, v[i + 1]))
#       [,1] [,2] [,3] [,4]
#  [1,]    1    1    0    0
#  [2,]    0    0    0    1
#  [3,]    0    1    0    0
#  [4,]    0    0    0    0
#  [5,]    0    0    0    1
#  [6,]    0    0    0    0
#  [7,]    1    1    0    1
#  [8,]    0    0    0    0
#  [9,]    0    0    0    1
# [10,]    1    0    0    0
d.b
  • 32,245
  • 6
  • 36
  • 77
  • 2
    Slightly faster variant given this size of a matrix: `apply(data, 2, function(i) unlist(v[i + 1]))` – r2evans Feb 04 '22 at 22:52
  • `Error: cannot allocate vector of size 74.5 Gb`. This is the error that I've been getting when my `data` has 50,000 rows and 100,000 columns. Is there any way around this? – Adrian Feb 06 '22 at 07:00
1

Here is an option without any loop

t(
  matrix(
    scan(text = toString(c("0, 0", "1, 0", "0, 1")[data + 1]), sep = ","),
    byrow = TRUE, 
    nrow = ncol(data)
  )
)

which gives

      [,1] [,2] [,3] [,4]
 [1,]    1    1    0    0
 [2,]    0    0    0    1
 [3,]    0    1    0    0
 [4,]    0    0    0    0
 [5,]    0    0    0    1
 [6,]    0    0    0    0
 [7,]    1    1    0    1
 [8,]    0    0    0    0
 [9,]    0    0    0    1
[10,]    1    0    0    0

A more concise option (thank @akrun's contribution)

> matrix(unlist(list(c(0, 0), c(1, 0), c(0, 1))[data + 1]), nrow = nrow(data) * 2)
      [,1] [,2] [,3] [,4]
 [1,]    1    1    0    0
 [2,]    0    0    0    1
 [3,]    0    1    0    0
 [4,]    0    0    0    0
 [5,]    0    0    0    1
 [6,]    0    0    0    0
 [7,]    1    1    0    1
 [8,]    0    0    0    0
 [9,]    0    0    0    1
[10,]    1    0    0    0
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • 1
    @akrun Thanks you so much, bro! That's fabulous! – ThomasIsCoding Feb 04 '22 at 23:58
  • `Error: cannot allocate vector of size 74.5 Gb`. This is the error that I've been getting when my `data` has 50,000 rows and 100,000 columns. Is there any way around this? – Adrian Feb 06 '22 at 07:00
  • @Adrian Your data is excessively big, that's way you get an error. You should look at this https://stackoverflow.com/a/9984368/12158757 – ThomasIsCoding Feb 07 '22 at 13:51
0

Start by making two matrices of the same dimensions as data, one which sets all of the 2s to 0 and the other which sets all of the 1s to 0 and all of the 2s to 1. Then interleave the two matrices row by row.

The first part if accomplished using ifelse; for the second part, flodel's answer to this question helps.

Putting it all together, you have

data <- matrix(c(1, 0, 0, 1, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 1), nrow = 5, ncol = 4)

l<-list(ifelse(data < 2, data, 0),
     ifelse(data > 1, 1, 0))

do.call(rbind, l)[order(sequence(sapply(l, nrow))), ]

#       [,1] [,2] [,3] [,4]
#  [1,]    1    1    0    0
#  [2,]    0    0    0    1
#  [3,]    0    1    0    0
#  [4,]    0    0    0    0
#  [5,]    0    0    0    1
#  [6,]    0    0    0    0
#  [7,]    1    1    0    1
#  [8,]    0    0    0    0
#  [9,]    0    0    0    1
# [10,]    1    0    0    0