1

The problem and the desired outcome

I've got a data set like the toy data here:

mydf<-   data.frame(
id = c("AB","AB","AB","BB","BB","CD","CD","CD","CD","ED"),
score = c(4,2,1,2,2,1,5,6,3,5))

And what I'd like to end up with as a data frame or table in the following format, where the column heading labels are somewhat arbitrary:

new.df <- data.frame(
id = c("AB", "BB", "CD","ED"),  
score1 = c(4,2,1,5), 
score2 = c(2,2,5,NA), 
score3 = c(1,NA,6,NA), 
score4 = c(NA,NA,3,NA)
)

The desired data frame "collapses" over id and preserves each score (i.e, I'm not trying to compute sums or averages. It seems as though the fact that there are duplicate entries in id is one reason why I'm not having much luck.

What I've tried:

  1. Using Reshape Following a question posted here.

    library(reshape)
    new.df1<-
    reshape(mydf, idvar = "id", timevar = "score", direction = "wide")
    View(new.df2)
    
    #Warning message:
    #In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,     :multiple rows match for score=2: first taken
    
  2. reshape again

    new.df2<-cast(mydf, score ~  id)
    #Using score as value column.  Use the value argument to cast to override this choice
    #Error in `[.data.frame`(data, , variables, drop = FALSE) : 
    #  undefined columns selected
    

3.Using 'tidyr'

Following a question here.

  library(tidyr)
  new.df2 <- spread(mydf, key = id, value = score)
  #Error: Duplicate identifiers for rows (1, 2, 3), (4, 5), (6, 7, 8, 9)
  1. And tidyr from a question here

    mydf$row <- 1:nrow(mydf) ## added explicit row numbers
    my.df4 <- 
      mydf %>%
      gather(vars, val, -id, -row )
    
    my.df4<-  spread(my.df4, val, vars)
    
  2. Using 'melt' and 'reshape'

From a great question here that I couldn't get to work:

library(reshape2)
my.df5 <- melt(mydf, id.vars = "id")
my.df5$row <- ave(my.df5$id, my.df5$id, FUN = seq_along)
 # Warning messages:
  # 1: In `[<-.factor`(`*tmp*`, i, value = 1:6) :
  # invalid factor level, NA generated
  1. This post didn't help either

I'd love any suggestions as to how to accomplish the desired data frame. Thanks everyone!

Cœur
  • 37,241
  • 25
  • 195
  • 267
Steve'sConnect
  • 145
  • 2
  • 8
  • 1
    change `ave(my.df5$id, my.df5$id, FUN = seq_along)` to `ave(seq_along(my.df5$id), my.df5$id, FUN = seq_along)` and it should work. or use `reshape(mydf, dir = 'wide', idvar = 'id', timevar = 'row', v.names = 'score', sep = '')` – rawr Feb 07 '16 at 23:15

0 Answers0