The problem and the desired outcome
I've got a data set like the toy data here:
mydf<- data.frame(
id = c("AB","AB","AB","BB","BB","CD","CD","CD","CD","ED"),
score = c(4,2,1,2,2,1,5,6,3,5))
And what I'd like to end up with as a data frame or table in the following format, where the column heading labels are somewhat arbitrary:
new.df <- data.frame(
id = c("AB", "BB", "CD","ED"),
score1 = c(4,2,1,5),
score2 = c(2,2,5,NA),
score3 = c(1,NA,6,NA),
score4 = c(NA,NA,3,NA)
)
The desired data frame "collapses" over id and preserves each score (i.e, I'm not trying to compute sums or averages. It seems as though the fact that there are duplicate entries in id is one reason why I'm not having much luck.
What I've tried:
Using
Reshape
Following a question posted here.library(reshape) new.df1<- reshape(mydf, idvar = "id", timevar = "score", direction = "wide") View(new.df2) #Warning message: #In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, :multiple rows match for score=2: first taken
reshape
againnew.df2<-cast(mydf, score ~ id) #Using score as value column. Use the value argument to cast to override this choice #Error in `[.data.frame`(data, , variables, drop = FALSE) : # undefined columns selected
3.Using 'tidyr'
Following a question here.
library(tidyr)
new.df2 <- spread(mydf, key = id, value = score)
#Error: Duplicate identifiers for rows (1, 2, 3), (4, 5), (6, 7, 8, 9)
And
tidyr
from a question heremydf$row <- 1:nrow(mydf) ## added explicit row numbers my.df4 <- mydf %>% gather(vars, val, -id, -row ) my.df4<- spread(my.df4, val, vars)
Using 'melt' and 'reshape'
From a great question here that I couldn't get to work:
library(reshape2)
my.df5 <- melt(mydf, id.vars = "id")
my.df5$row <- ave(my.df5$id, my.df5$id, FUN = seq_along)
# Warning messages:
# 1: In `[<-.factor`(`*tmp*`, i, value = 1:6) :
# invalid factor level, NA generated
- This post didn't help either
I'd love any suggestions as to how to accomplish the desired data frame. Thanks everyone!