0

I have a very large list that looks like this:

1
2
3
3

and need to create a list that looks like this:

 |------|------|------|------|
 |   1  |    1 |  0   |   0  |
 |------|------|------|------|
 |   2  |   0  |  1   |  0   | 
 |------|------|------|------|
 |   3  |   0  |   0  |  1   |
 |------|------|------|------|
 |   3  |   0  |   0  |  1   |
 |------|------|------|------|

I have tried using loops, and the method detailed here:

Create mutually exclusive dummy variables from categorical variable in R

But because the dataset is too large, I run into memory constraints.

Am thinking of using a split, apply, combine technique, but am not able to get the desired result.

Help is much appreciated!

whackamadoodle3000
  • 6,684
  • 4
  • 27
  • 44
  • 1
    I think you need to give a bit more detail about the list you currently have, and how you expect to get your desired result. It will help immensely if you provide some data for others to work with, and an example of the expected outcome. – SymbolixAU Aug 20 '17 at 01:38
  • 1
    The size of your matrix will be length(MyList) * length(unique(MyList)). If that is too big for memory, your problem is not how to compute this matrix but rather how to represent this matrix. – G5W Aug 20 '17 at 01:40

2 Answers2

3

Here are some ways:

1) outer This gives a matrix result:

x <- c(1, 2, 3, 3)
outer(x, unique(x), "==") + 0

giving:

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
[4,]    0    0    1

2) model.matrix This also gives a matrix result.

fx <- factor(x)
model.matrix(~ fx + 0)

giving:

  fx1 fx2 fx3
1   1   0   0
2   0   1   0
3   0   0   1
4   0   0   1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$fx
[1] "contr.treatment"

3) sparseMatrix This uses a sparse matrix internal representation for the result so it will not use storage for the zeros.

library(Matrix)

# ok for this example
sparseMatrix(seq_along(x), x)  

# but if x does not contain sequence numbers use this instead
sparseMatrix(seq_along(x), as.numeric(factor(x)))

giving:

4 x 3 sparse Matrix of class "dgCMatrix"

    [1,] 1 . .
    [2,] . 1 .
    [3,] . . 1
    [4,] . . 1
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2
vars = c(1, 2, 3, 3)
data.frame(vars,
           replace(matrix(rep(0, max(vars) * length(vars)), nrow = length(vars)),
                   cbind(seq_along(vars), vars),
                   1))
#  vars X1 X2 X3
#1    1  1  0  0
#2    2  0  1  0
#3    3  0  0  1
#4    3  0  0  1
d.b
  • 32,245
  • 6
  • 36
  • 77