0

Posting a second question because my first was marked as a duplicate. I apologize in advance if there already is a question that addresses this specific issue.

I started out with a dataframe as follows:

dat<-data.frame(
ID=c(100,101,101,101,102,103),
DEGREE=c("BA","BA","MS","PHD","BA","BA"),
YEAR=c(1980,1990, 1992, 1996, 2000, 2004))

> dat
ID DEGREE YEAR
100     BA 1980
101     BA 1990
101     MS 1992
101    PHD 1996
102     BA 2000
103     BA 2004

ID 101 earned a BA in 1990, an MS in 1992, and a PHD in 1996.

I want to reshape this dataframe into a wide format that ultimately looks like this:

 ID DEGREE_1 DEGREE_2 DEGREE_3 YEAR_DEGREE_1 YEAR_DEGREE_2 YEAR_DEGREE_3
 100    BA                           1980                            
 101    BA      MS      PHD          1990        1992          1996
 102    BA                           2000                            
 103    BA                           2004           

With help from an answer to my original question, I attempted to create my new data frame using the following code:

dat$DEGREE<-as.character(dat$DEGREE)
dat %>% group_by(ID) %>%
mutate(DegreeNum = paste("Degree", row_number(), sep = "_"))%>%
mutate(DegreeYear = paste("YearDegree", row_number(), sep = "_"))%>%
spread(DegreeNum, DEGREE, fill = "")%>%
spread(DegreeYear,YEAR,fill="")%>%
as.data.frame()

 ID Degree_1 Degree_2 Degree_3 YearDegree_1 YearDegree_2 YearDegree_3
 100   BA                           1980                          
 101                    PHD                                  1996
 101            MS                               1992             
 101   BA                           1990                          
 102   BA                           2000                          
 103   BA                           2004    

This is as far as I was able to get, but cannot figure out how to reshape it into a dataframe so that everything from ID 101 is in one row. Any help would be appreciated.

klee907
  • 5
  • 2
  • 4
    `tidyr` deliberately makes this hard as far as I know - [Hadley](https://stackoverflow.com/questions/29775461/how-can-i-spread-repeated-measures-of-multiple-variables-into-wide-format#comment47685477_29776899): "*the goal of tidyr is to make your data tidy, so you shouldn't expect doing the opposite to be easier*" Base R's boring old `reshape` does it one line `reshape(transform(dat, time=ave(ID,ID,FUN=seq_along)), idvar="ID", direction="wide", sep="")`. Nevertheless, I think this is a duplicate of that question – thelatemail Jun 20 '18 at 04:04

1 Answers1

0

Not so hard with tidyverse...

df<-data.frame(ID=c(100,101,101,101,102,103),
           DEGREE=c("BA","BA","MS","PHD","BA","BA"),
             YEAR=c(1980,1990, 1992, 1996, 2000, 2004),
             stringsAsFactors=FALSE)

df1 <- df %>% select(-3) %>% group_by(ID) %>% mutate(i=row_number()) %>%
       as.data.frame() %>%
       reshape(direction="wide",idvar="ID",v.names="DEGREE",timevar="i",sep="_")
df1[is.na(df1)] <- ""

df2 <- df %>% select(-2) %>% group_by(ID) %>% mutate(i=row_number()) %>%
       as.data.frame() %>%
       reshape(direction="wide",idvar="ID",v.names="YEAR",timevar="i",sep="_")
df2[is.na(df2)] <- ""

inner_join(df1,df2,"ID")
#   ID DEGREE_1 DEGREE_2 DEGREE_3 YEAR_1 YEAR_2 YEAR_3
#1 100       BA                     1980              
#2 101       BA       MS      PHD   1990   1992   1996
#3 102       BA                     2000              
#4 103       BA                     2004              
Nicolas2
  • 2,170
  • 1
  • 6
  • 15
  • a) this is not tidyverse - `reshape` is a base R function, b) `reshape` can do this all in a single call as it is able to 'spread' multiple variables at once - `reshape(transform(dat, time=ave(ID,ID,FUN=seq_along)), idvar="ID", direction="wide", sep="")` as commented above. – thelatemail Jun 21 '18 at 08:11
  • @thelatemail. Of course and '<-' is not part of the tidyverse too. But being very new to R I had some difficulties to understand your "ave(...)" : a shorter answer is not allways the best. Also I wanted to produce the exact example given by klee907. For me it perfectly makes sense. – Nicolas2 Jun 21 '18 at 09:43