1

This is slightly different with long to wide format. (please don't report duplicated)

I have data as below. I would like to transpose based on term column with the corresponding value in subject column. the result would be like df_result:

DF <- data.frame(ID = c("10", "10", "10", "10", "10", "11", "11", "11", "12", "12"),
             term = c("1", "1", "2", "2", "3", "1", "1", "2", "1", "1"),
             subject = c("math1", "phys1", "math2", "chem1", "cmp1", "math1", "phys1", "math2", "math1", "phys1"),
             graduation = c ("grad", "grad", "grad", "grad", "grad", "drop", "drop", "drop", "enrolled", "enrolled"))

Df

ID   term   subject   graduation
10    1      math1      grad
10    1      phys1      grad
10    2      math2      grad
10    2      chem1      grad
10    3      cmp1       grad
11    1      math1      drop
11    1      phys1      drop
11    2      math2      drop
12    1      math1      enrolled
12    1      phys1      enrolled

Df_result:

ID  term1  term2  term3   graduation
10  math1  math2  cmp1     grad
10  phys1  chem1  NA       grad
11  math1  math2  NA       drop
11  phys1   NA    NA       drop
12  math1   NA    NA       Enrolled
12  math2   NA    NA       Enrolled

Using reshape produce close to what I want but it only keeps the first match.

resjape(DF, idvar = c("ID","graduation"), timevar = "term", direction = "wide") 

it produces:

  ID graduation subject.1 subject.2 subject.3
1 10       grad     math1     math2      cmp1
6 11       drop     math1     math2      <NA>
9 12   enrolled     math1      <NA>      <NA>

The problem is timevar only keeps the first match. using dcast and melt only fill the data with function length.

how can I solve it in R?

Cina
  • 9,759
  • 4
  • 20
  • 36

1 Answers1

2

This is the same as a reshape from long-to-wide, but you need a new variable to help you uniquely identify a row in the new format. I call this variable classnum below and I use data.table's syntax to help me create it:

# add helper variable "classnum"
library(data.table)
setDT(DF)
DF[ , classnum := 1:.N, by=.(ID, term)]

#reshape long-to-wide
tidyr::spread(DF, term, subject)

Result:

   ID graduation classnum     1     2    3
1: 10       grad        1 math1 math2 cmp1
2: 10       grad        2 phys1 chem1 <NA>
3: 11       drop        1 math1 math2 <NA>
4: 11       drop        2 phys1  <NA> <NA>
5: 12   enrolled        1 math1  <NA> <NA>
6: 12   enrolled        2 phys1  <NA> <NA>
DanY
  • 5,920
  • 1
  • 13
  • 33
  • 1
    Your answer is quite similar to jaap's answer in the dupe post. – lmo Dec 01 '18 at 23:05
  • 1
    I didn't see that (and I think I managed to slap this down before the dup was flagged). We closed almost this exact question yesterday as a dup. Since the author reposted, I figured an answer was needed. On the bright side, Jaap and I used different functions so we provide alternative ways of getting the job done. – DanY Dec 01 '18 at 23:08
  • In the spirit of the pedantic, jaap supplied two answers in one posted, a `data.table` solution and a "tidyverse" solution that uses `spread`. To be honest, I've posted my fair share of similar (near) dupe posts in this same scenario. – lmo Dec 01 '18 at 23:16