-2

I would like to reshape the following data frame dat:

dat<-data.frame( ID=c(100,101,101,101,102,103), DEGREE=c("BA","BA","MS","PHD","BA","BA"), YEAR=c(1980,1990, 1992, 1996, 2000, 2004)

dat

Where ID is an ID number tied to an individual, DEGREE is the type of DEGREE EARNED, and YEAR is the year in which the degree was earned. In this case, ID 101 earned a BA, MS, and PHD.

I would like to reshape the data into a wide format such that there are columns for each degree, but are not named after the degree value themselves. Additionally, I would like the years corresponding to each degree.

Like such :

finaldat

The main point being that I do not want to create new columns based on the values of DEGREE or YEAR (this is what happens when I attempted to use spread, dcast, etc).

Thank you!

klee907
  • 5
  • 2
  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](https://stackoverflow.com/questions/5963269). Since your question has been closed as duplicate, you should post the code that causes you problem if you are convinced that it is not a duplicate. – Ralf Stubner Jun 19 '18 at 21:36

1 Answers1

0

You can try tidyr::spread as:

library(tidyr)

spread(dat, DEGREE,YEAR)
#   ID   BA   MS  PHD
#1 100 1980   NA   NA
#2 101 1990 1992 1996
#3 102 2000   NA   NA
#4 103 2004   NA   NA

OR: Solution to have columns as Degree_1, Degree_2 and Degree_3 can be as

library(tidyverse)

dat %>% group_by(ID) %>%
  mutate(DegreeNum = paste("Degree", row_number(), sep = "_")) %>%
  unite(DEGREE, c("YEAR", "DEGREE"), sep = ":") %>%
  spread(DegreeNum, DEGREE, fill = "") %>%
  as.data.frame()

#    ID Degree_1 Degree_2 Degree_3
# 1 100  1980:BA                  
# 2 101  1990:BA  1992:MS 1996:PHD
# 3 102  2000:BA                  
# 4 103  2004:BA 

Edited: Based on feedback from OP. One can spread both DEGREE and YEAR together using data.table::dcast as:

library(data.table)

setDT(dat)
dat[,DegreeNum := 1:.N, by=ID]
dcast(dat, ID ~ DegreeNum, value.var = c("DEGREE", "YEAR"))

#     ID DEGREE_1 DEGREE_2 DEGREE_3 YEAR_1 YEAR_2 YEAR_3
# 1: 100       BA       NA       NA   1980     NA     NA
# 2: 101       BA       MS      PHD   1990   1992   1996
# 3: 102       BA       NA       NA   2000     NA     NA
# 4: 103       BA       NA       NA   2004     NA     NA
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thank you, I have tried this but the problem is that I don't want to create new columns named after the degree type. The reason is because this is part of a larger dataframe with many different degree types. Because no ID has more than 3 or so degrees, it would be optimal for me to have a column for the first degree, a column for the second, and so on. – klee907 Jun 19 '18 at 20:57
  • @klee907 I have updated my answer. Have a look. Perhaps that will resolve your problem. – MKR Jun 19 '18 at 22:09
  • Do you know how I would be able to reshape it into a data frame without uniting? In my final data frame I would need the type of Degree and the corresponding degree year to be separate. Something like the image I posted above: https://i.stack.imgur.com/SaNGT.png – klee907 Jun 20 '18 at 03:25
  • @klee907 You can do this very easily with `data.table`. Have a look for updated answer. – MKR Jun 20 '18 at 06:26