Turn two-column dataframe with duplicate identifiers from long to wide, preserving scores

Question

The problem and the desired outcome

I've got a data set like the toy data here:

mydf<-   data.frame(
id = c("AB","AB","AB","BB","BB","CD","CD","CD","CD","ED"),
score = c(4,2,1,2,2,1,5,6,3,5))

And what I'd like to end up with as a data frame or table in the following format, where the column heading labels are somewhat arbitrary:

new.df <- data.frame(
id = c("AB", "BB", "CD","ED"),  
score1 = c(4,2,1,5), 
score2 = c(2,2,5,NA), 
score3 = c(1,NA,6,NA), 
score4 = c(NA,NA,3,NA)
)

The desired data frame "collapses" over id and preserves each score (i.e, I'm not trying to compute sums or averages. It seems as though the fact that there are duplicate entries in id is one reason why I'm not having much luck.

What I've tried:

Using Reshape Following a question posted here.

library(reshape)
new.df1<-
reshape(mydf, idvar = "id", timevar = "score", direction = "wide")
View(new.df2)

#Warning message:
#In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,     :multiple rows match for score=2: first taken

reshape again

new.df2<-cast(mydf, score ~  id)
#Using score as value column.  Use the value argument to cast to override this choice
#Error in `[.data.frame`(data, , variables, drop = FALSE) : 
#  undefined columns selected

3.Using 'tidyr'

Following a question here.

  library(tidyr)
  new.df2 <- spread(mydf, key = id, value = score)
  #Error: Duplicate identifiers for rows (1, 2, 3), (4, 5), (6, 7, 8, 9)

And tidyr from a question here

mydf$row <- 1:nrow(mydf) ## added explicit row numbers
my.df4 <- 
  mydf %>%
  gather(vars, val, -id, -row )

my.df4<-  spread(my.df4, val, vars)

Using 'melt' and 'reshape'

From a great question here that I couldn't get to work:

library(reshape2)
my.df5 <- melt(mydf, id.vars = "id")
my.df5$row <- ave(my.df5$id, my.df5$id, FUN = seq_along)
 # Warning messages:
  # 1: In `[<-.factor`(`*tmp*`, i, value = 1:6) :
  # invalid factor level, NA generated

This post didn't help either

I'd love any suggestions as to how to accomplish the desired data frame. Thanks everyone!

change `ave(my.df5$id, my.df5$id, FUN = seq_along)` to `ave(seq_along(my.df5$id), my.df5$id, FUN = seq_along)` and it should work. or use `reshape(mydf, dir = 'wide', idvar = 'id', timevar = 'row', v.names = 'score', sep = '')` — rawr, Feb 07 '16 at 23:15

Turn two-column dataframe with duplicate identifiers from long to wide, preserving scores

The problem and the desired outcome

What I've tried:

0 Answers0