-1

A while ago I've posted a question about how to convert factor data.frame into a binary (hot-encoding) data.frame here. Now I am trying to find the most efficient way to loop over trials (rows) and binarize a factor variable. A minimal example would look like this:

d = data.frame(
    Trial = c(1,2,3,4,5,6,7,8,9,10),
    Category = c('a','b','b','b','a','b','a','a','b','a')
)
d

   Trial Category
1      1        a
2      2        b
3      3        b
4      4        b
5      5        a
6      6        b
7      7        a
8      8        a
9      9        b
10    10        a

While I would like to get this:

   Trial  a  b
1      1  1  0
2      2  0  1
3      3  0  1
4      4  0  1
5      5  1  0
6      6  0  1
7      7  1  0
8      8  1  0
9      9  0  1
10    10  1  0

What would be the most efficient way of doing it?

striatum
  • 1,428
  • 3
  • 14
  • 31

1 Answers1

0

here is an option with pivot_wider. Create a column of 1's and then apply pivot_wider with names_from the 'Category' and values_from the newly created column

library(dplyr)
library(tidyr)
d %>%
  mutate(n = 1) %>%
  pivot_wider(names_from = Category, values_from = n, values_fill = list(n = 0))
# A tibble: 10 x 3
#   Trial     a     b
#   <dbl> <dbl> <dbl>
# 1     1     1     0
# 2     2     0     1
# 3     3     0     1
# 4     4     0     1
# 5     5     1     0
# 6     6     0     1
# 7     7     1     0
# 8     8     1     0
# 9     9     0     1
#10    10     1     0

The efficient option would be data.table

library(data.table)
dcast(setDT(d), Trial ~ Category, length)

It can also be done with base R

table(d)
akrun
  • 874,273
  • 37
  • 540
  • 662