2

I have a series of dataset with repeating scores, the data frame is as follows

ID,Variable,Category
1,6,A
2,4,C
3,3,D
4,4,C
5,5,B
6,3,D
7,6,A
8,4,C
9,5,B
10,3,D

I want it to create a logic like this

ID,A,B,C,D
1,1,0,0,0
2,0,0,1,0
3,0,0,0,1
4,0,0,1,0
5,0,1,0,0
6,0,0,0,1
7,1,0,0,0
8,0,0,1,0
9,0,1,0,0
10,0,0,0,1
Massive44
  • 29
  • 5

1 Answers1

3

Three options.

  1. This doesn't technically return a data.frame, it returns a "xtabs","table" class object, whose conversion to a data.frame is not necessarily what one might expect.

    xtabs(~ID + Category, data=dat)
    #     Category
    # ID   A B C D
    #   1  1 0 0 0
    #   2  0 0 1 0
    #   3  0 0 0 1
    #   4  0 0 1 0
    #   5  0 1 0 0
    #   6  0 0 0 1
    #   7  1 0 0 0
    #   8  0 0 1 0
    #   9  0 1 0 0
    #   10 0 0 0 1
    class(xtabs(~ID + Category, data=dat))
    # [1] "xtabs" "table"
    head(as.data.frame(xtabs(~ID + Category, data=dat)))
    #   ID Category Freq
    # 1  1        A    1
    # 2  2        A    0
    # 3  3        A    0
    # 4  4        A    0
    # 5  5        A    0
    # 6  6        A    0
    
  2. Using tidyr::pivot_wider:

    tidyr::pivot_wider(dat, ID, names_from = Category, values_from = Variable, values_fill = list(Variable = 0))
    # # A tibble: 10 x 5
    #       ID     A     C     D     B
    #    <int> <int> <int> <int> <int>
    #  1     1     6     0     0     0
    #  2     2     0     4     0     0
    #  3     3     0     0     3     0
    #  4     4     0     4     0     0
    #  5     5     0     0     0     5
    #  6     6     0     0     3     0
    #  7     7     6     0     0     0
    #  8     8     0     4     0     0
    #  9     9     0     0     0     5
    # 10    10     0     0     3     0
    
  3. data.table::dcast:

    library(data.table)
    dcast(as.data.table(dat), ID~Category, value.var = "Variable", fill = 0)
    #     ID A B C D
    #  1:  1 6 0 0 0
    #  2:  2 0 0 4 0
    #  3:  3 0 0 0 3
    #  4:  4 0 0 4 0
    #  5:  5 0 5 0 0
    #  6:  6 0 0 0 3
    #  7:  7 6 0 0 0
    #  8:  8 0 0 4 0
    #  9:  9 0 5 0 0
    # 10: 10 0 0 0 3
    

While options 2 and 3 do not produce your literal output, it shows their flexibility: you can adjust them to be all 0s and 1s by preemptively converting dat$Variable <- 1L.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Thanks a lot the first one is the one I wanted. – Massive44 Jun 19 '20 at 21:30
  • 1
    Suggestions: `dcast(dat, ID ~ Category, value.var = "Variable", function(x) as.logical(x)+0, fill = 0)` and `as.data.frame.matrix(xtabs(~ID + Category, data=dat))`. – A5C1D2H2I1M1N2O1R2T1 Jun 20 '20 at 03:17
  • 1
    I hadn't tried the `.matrix` variant around `xtabs`, great recommendation; and I should have thought about using a literal function for `dcast`, I don't feel proficient with it yet. Thanks! – r2evans Jun 20 '20 at 15:09