3

I've managed to get data in the format:

run    type1
data1  12
data2  13
run    type2
data1  14
data2  15
...

I want:

run    data1 data2
type1     12    13
type2     14    15
...

I've tried cast/dcast to no avail. Any suggestions?

sample data:

data.frame(matrix(c("run","type1","data1",12,"data2",13,"run","type2","data1",14,"data3",15), ncol=2, byrow=T))
verigolfer
  • 359
  • 1
  • 10

1 Answers1

2

This is my suggestion:

cast.runs <- function(d) {
  isrun <- d[[1]]=="run"
  whichrun <- which(isrun)
  lens <- diff(c(whichrun, nrow(d)+1))
  runlabels <- inverse.rle(list(lengths=lens, values=d[[2]][whichrun]))
  return(cbind(run=runlabels, d)[!isrun,])
}

This function will yield suitable long format, which you can then recast as you see fit:

  runlabels    X1 X2
2     type1 data1 12
3     type1 data2 13
5     type2 data1 14
6     type2 data3 15

Unsurprisingly, I start by identifying the run lines. I cound how many rows there are for each run, including the title row. That code is inspired by this answer. Next I repeat each run label that many times, and in the end I drop the title rows.

One possible way to cast this output would be using the dcast function from the reshape2 package:

> dcast(cast.runs(d), run ~ X1)
Using X2 as value column: use value.var to override.
    run data1 data2 data3
1 type1    12    13  <NA>
2 type2    14  <NA>    15
Community
  • 1
  • 1
MvG
  • 57,380
  • 22
  • 148
  • 276