1

a common task in the data I work with is reshaping client data from long to wide. I have a process to do this with Reshape outlined below that basically creates new (but unmodified) columns with a numeric index appended. In my case I do not want to perform any modifications on the data. My question, because I often use reshape2 for other operations, is how this can be accomplished with dcast? It does not seem that the example data need to be melted by id, for example, but I'm not sure how I would go about making it wide. Would anyone be able to provide code in reshape2 to produce a frame comparable to "wide" in the example below?

Thanks.

Example

date_up   <- as.numeric(as.Date("1990/01/01"))
date_down <- as.numeric(as.Date("1960/01/01"))
ids <- data.frame(id=rep(1:1000, 3),site=rep(c("NMA", "NMB","NMC"), 1000))
ids <- ids[order(ids$id), ]
dates <-  data.frame(datelast=runif(3000, date_down, date_up),
          datestart=runif(3000, date_down, date_up),
          dateend=runif(3000, date_down, date_up),
          datemiddle=runif(3000, date_down, date_up))
dates[] <- lapply(dates[ , c("datestart", "dateend", "datemiddle")], 
             as.Date.numeric, origin = "1970-01-01")
df <- cbind(ids, dates)

# Make a within group index and reshape df
df$gid <- with(df, ave(rep(1, nrow(df)), df[,"id"], FUN = seq_along))
wide <- reshape(df, idvar = "id", timevar = "gid", direction = "wide")
Derek Darves
  • 192
  • 1
  • 5
  • At the moment one needs to run this twice (with an initial error that most R-newbs would find puzzling having to do with a "closure" because the object `df` is the F-density function in R. The second time around, there is a `df`-data-object and so no error occurs. (I only made a 30 row matrix to work with.) – IRTFM Jan 25 '16 at 19:12
  • You are correct, thanks for pointing that out. I updated the code to fix the error. – Derek Darves Jan 26 '16 at 13:29

1 Answers1

2

We can use dcast from data.table, which can take multiple value.var columns. Convert the 'data.frame' to 'data.table' (setDT(df)), use the dcast with formula and value.var specified.

library(data.table)
dcast(setDT(df), id~gid, value.var=names(df)[2:6])

NOTE: The data.table method would be faster compared to the reshape2

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for this. More and more I am wondering if I should move my batch processes over to data.table as it seems there's lots of solutions with this package and it's assorted helpers. For now it seems that dcast will only do the operation I'd like on one value var, meaning I need to melt it first, it would seem. Thanks again for this approach, I will give it a shot. – Derek Darves Jan 26 '16 at 21:50