14

I am aware of the spread function in the tidyr package but this is something I am unable to achieve. I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0.

Below is the data frame:

studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
         Subject = c("Maths", "Science", "English", "Maths", "History", "History"))

> studentInfo
  StudentID Subject
1         1   Maths
2         1 Science
3         1 English
4         2   Maths
5         3 History
6         3 History

And the output I am expecting is:

  StudentID Maths Science English History
1         1     1       1       1       0
2         2     1       0       0       0
3         3     0       0       0       1

How can I do this with the spread() function or any other function.

Henrik
  • 65,555
  • 14
  • 143
  • 159
sachinv
  • 492
  • 2
  • 5
  • 18

3 Answers3

16

Using reshape2 we can dcast from long to wide.

As you only want a binary outcome we can unique the data first

library(reshape2)

si <- unique(studentInfo)
dcast(si, formula = StudentID ~ Subject, fun.aggregate = length)

#  StudentID English History Maths Science
#1         1       1       0     1       1
#2         2       0       0     1       0
#3         3       0       1     0       0

Another approach using tidyr and dplyr is

library(tidyr)
library(dplyr)

studentInfo %>%
  mutate(yesno = 1) %>%
  distinct %>%
  spread(Subject, yesno, fill = 0)

#  StudentID English History Maths Science
#1         1       1       0     1       1
#2         2       0       0     1       0
#3         3       0       1     0       0

Although I'm not a fan (yet) of tidyr syntax...

SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
8

Using tidyr :

library(tidyr)
studentInfo <- data.frame(
  StudentID = c(1,1,1,2,3,3),
  Subject = c("Maths", "Science", "English", "Maths", "History", "History"))

pivot_wider(studentInfo,
            names_from = "Subject", 
            values_from = 'Subject', 
            values_fill = 0,
            values_fn = function(x) 1)
#> # A tibble: 3 x 5
#>   StudentID Maths Science English History
#>       <dbl> <int>   <int>   <int>   <int>
#> 1         1     1       1       1       0
#> 2         2     1       0       0       0
#> 3         3     0       0       0       1

Created on 2019-09-19 by the reprex package (v0.3.0)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • This solution is great...could you please elaborate a bit on the ~+(as.logical(length(.))))). Specifically could you point me to where would I find more documentation on use of "+"? Thanks! – thisisrg Apr 20 '20 at 02:21
  • 1
    Thanks thisisrg, here `+` will transform a logical into an integer, `+TRUE` is 1. Thus the result of the call is always 1 unless the length is zero. – moodymudskipper Apr 20 '20 at 09:15
  • Thanks. `pivot_wider` actually complex for creating dummy vars but your code also helped me do this with T/F values. I had to change the `data.frame` to `tibble` and won't work without that, not sure why. ```r studentInfo <- tibble(StudentID = c(1,1,1,2,3,3), Subject = c("Maths", "Science", "English", "Maths", "History", "History")) pivot_wider(studentInfo, names_from = Subject, values_from = Subject, values_fill = list(Subject = F), values_fn = list(Subject = is.character)) ``` – jameshowison Jun 09 '20 at 17:36
  • Hi James, on my system your code gives an identical output whether I use data.frame or tibble (latest released versions of R and tidyr). – moodymudskipper Jun 10 '20 at 00:56
7

We can use table from base R

+(table(studentInfo)!=0)
#            Subject
#StudentID English History Maths Science
 #       1       1       0     1       1
 #       2       0       0     1       0
 #       3       0       1     0       0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Wow, that's a very elegant way to use. But in case of larger data frames, it is giving message like "Error in table : attempt to make a table with >= 2^31 elements" – sachinv Feb 27 '16 at 10:53
  • @sachinv You might have crossed the limit of the number of observations. – akrun Feb 27 '16 at 10:54