5

The melt/cast functions in the reshape package are great, but I'm not sure if there is a simple way to apply them when measured variables are of different types. For example, here is a snippet from data where each MD provides the gender and weight of three patients:

ID PT1 WT1 PT2 WT2 PT3 WT3
1  "M" 170 "M" 175 "F" 145
...

where the objective is to reshape so each row is a patient:

ID PTNUM GENDER WEIGHT
1    1     "M"    170
1    2     "M"    175
1    3     "F"    145
...

Using the reshape function in the stats package is one option of which I'm aware, but I'm posting here in the hopes that R users more experienced than me will post other, hopefully better methods. Many thanks!

--

@Vincent Zoonekynd :

I liked your example a lot, so I generalized it to multiple variables.

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  p1 = sample(c("M","F"),n,replace=TRUE),
  q1 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w1 = round(runif(n,100,200)),
  y1 = round(runif(n,100,200)),
  p2 = sample(c("M","F"),n,replace=TRUE),
  q2 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w2 = round(runif(n,100,200)),
  y2 = round(runif(n,100,200)),
  p3 = sample(c("M","F"),n,replace=TRUE),
  q3 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w3 = round(runif(n,100,200)),
  y3 = round(runif(n,100,200))
  )
# Reshape the data.frame, one variable at a time
library(reshape)
d1 <- melt(d, id.vars="id", measure.vars=c("p1","p2","p3","q1","q2","q3"))
d2 <- melt(d, id.vars="id", measure.vars=c("w1","w2","w3","y1","y2","y3"))
d1 = cbind(d1,colsplit(d1$variable,names=c("var","ptnum")))
d2 = cbind(d2,colsplit(d2$variable,names=c("var","ptnum")))
d1$variable = NULL
d2$variable = NULL
d1c = cast(d1,...~var)
d2c = cast(d2,...~var)
# Join the two data.frames
d3 = merge(d1c, d2c, by=c("id","ptnum"), all=TRUE)

--

Final thoughts: my motivation for this question was to learn about alternatives to the reshape package other than the stats::reshape function. For the moment, I've reached the following conclusions:

  • Stick to stats::reshape when you can. As long as you remember to use a list rather than a simple vector for the "varying" argument, you'll stay out of trouble. For smaller data sets--a few thousand patient cases with less than 200 variables in total is what I was dealing with this time--the lower speed of this function is worth the simplicity of the code.

  • To use the cast/melt approach in Hadley Wickham's reshape (or reshape2) package, you have to split your variables into two sets, one consisting of numeric variables and another of character variables. When your data set is large enough that you find stats::reshape unbearable, I imagine the extra step of dividing your variables into two sets won't seem so bad.

  • 3
    Better in what sense? reshape is meant for tasks just like this, so why not use it? – Fojtasek Feb 18 '12 at 16:37
  • Reshape works well as long as one remembers to use the canonical form of the varying argument (a list): [link](http://www.mail-archive.com/r-help@r-project.org/msg160715.html) With R, I'm constantly surprised by the existence of alternatives of which I was unaware, so I thought someone would post other approaches. One example of defining "better" would be speed: in my admittedly subjective experience, cast/melt seem faster than the built-in reshape. – Krishna Tateneni Feb 19 '12 at 00:50
  • I didn't notice this, thanks. I've now changed the library from reshape2 to reshape. You can still use reshape2 if you like: you'll have to add the pattern argument with a value of "" to the colsplit() call. – Krishna Tateneni Jun 07 '13 at 18:07

2 Answers2

3

You can process each variable separately, and join the resulting two data.frames.

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  pt1 = sample(c("M","F"),n,replace=TRUE),
  wt1 = round(runif(n,100,200)),
  pt2 = sample(c("M","F"),n,replace=TRUE),
  wt2 = round(runif(n,100,200)),
  pt3 = sample(c("M","F"),n,replace=TRUE),
  wt3 = round(runif(n,100,200))
)
# Reshape the data.frame, one variable at a time
library(reshape2)
d1 <- melt(d, 
  id.vars="id", measure.vars=c("pt1","pt2","pt3"), 
  variable.name="patient", value.name="gender"
)
d2 <- melt(d, 
  id.vars="id", measure.vars=c("wt1","wt2","wt3"), 
  variable.name="patient", value.name="weight"
)
d1$patient <- as.numeric(gsub("pt", "", d1$patient))
d2$patient <- as.numeric(gsub("wt", "", d1$patient))
# Join the two data.frames
merge(d1, d2, by=c("id","patient"), all=TRUE)
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • Thank you, this makes sense. The code is longer than that required for stats::reshape, but is probably worth the efficiency gained when there are a lot of variables, which is often the case for me. – Krishna Tateneni Feb 19 '12 at 00:58
2

I think the reshape function in the stats package is simplest. Here is a simple example, does this do what you want?

> tmp
  id val val2 cat
1  1   1   14   a
2  1   2   13   b
3  2   3   12   b
4  2   4   11   a
> tmp2 <- tmp
> tmp2$t <- ave(tmp2$val, tmp2$id, FUN=seq_along)
> tmp2
  id val val2 cat t
1  1   1   14   a 1
2  1   2   13   b 2
3  2   3   12   b 1
4  2   4   11   a 2
> reshape(tmp2, idvar='id', timevar='t', direction='wide')
  id val.1 val2.1 cat.1 val.2 val2.2 cat.2
1  1     1     14     a     2     13     b
3  2     3     12     b     4     11     a

Hopefully your patients sex is not changing each appointment, but there could be other categorical variables that change between visits

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • What I had in mind was multiple patients per MD, not the same patient measured on multiple occasions. But I agree that code with stats::reshape is definitely more compact, as long as one is aware of the potential gotchas: [see link](http://www.mail-archive.com/r-help@r-project.org/msg160715.html) – Krishna Tateneni Feb 19 '12 at 19:01
  • So in my example use id as the physician id and the multiple values represent patients. – Greg Snow Feb 20 '12 at 18:48