2

Consider the following dataset which is taken from this question: Going from wide to long w/ coupled-columns: Is there a more R way to do this (i.e. - without using a for loop)?

(dput at the end of the question)

   phrase wo1sp wo2sp wo3sp wo1sc wo2sc wo3sc
1   hello   dan  mark  todd    10     5     4
2   hello  mark   dan chris     8     9     4
3 goodbye  mark   dan   kev     2     4    10
4    what   kev   dan  mark     4     5     5

The goal is to reshape the data from wide to long taking into account that there is some pattern in the column names. The expected output is

    phrase time    sp sc
1    hello    1   dan 10
2    hello    1  mark  8
3  goodbye    1  mark  2
4     what    1   kev  4
5    hello    2  mark  5
6    hello    2   dan  9
7  goodbye    2   dan  4
8     what    2   dan  5
9    hello    3  todd  4
10   hello    3 chris  4
11 goodbye    3   kev 10
12    what    3  mark  5

@docendodiscimus provided a sound solution using melt from data.table but I'd like to use reshape() from the stats library for the sake of practice.

This function is certainly quite powerful but I almost always run into a problem with one of its arguments, so this time it's the new.row.names argument which I rarely used before.

I tried

reshape(
  dat,
  idvar = "phrase",
  varying = list(
    "sp" = grep("sp$", names(dat)), 
    "sc" = grep("sc$", names(dat))
  ),
  direction = "long", 
  v.names = c("sp", "sc") # name of cols in long format
)

This returns the error

error in row.names<-.data.frame(*tmp*, value = paste(ids, times[i], : duplicate 'row.names' are not allowed

In addition: Warning message: non-unique value when setting 'row.names': ‘hello.1’

Reading the error message I found the 'solution' to be the new.row.names argument which I set to be 1:12, see below. (I cheated here because I looked how many rows were returned by the data.table solution.)

My question is what would be the generic solution to this problem?

# works!
reshape(
  dat,
  idvar = "phrase",
  varying = list(
    "sp" = grep("sp$", names(dat)),
    "sc" = grep("sc$", names(dat))
  ),
  direction = "long", 
  v.names = c("sp", "sc"),
  new.row.names = 1:12 # 1:10000 would also work
)

data

dat <- structure(list(phrase = c("hello", "hello", "goodbye", "what"
), wo1sp = c("dan", "mark", "mark", "kev"), wo2sp = c("mark", 
"dan", "dan", "dan"), wo3sp = c("todd", "chris", "kev", "mark"
), wo1sc = c(10L, 8L, 2L, 4L), wo2sc = c(5L, 9L, 4L, 5L), wo3sc = c(4L, 
4L, 10L, 5L)), .Names = c("phrase", "wo1sp", "wo2sp", "wo3sp", 
"wo1sc", "wo2sc", "wo3sc"), class = "data.frame", row.names = c(NA, 
-4L))
markus
  • 25,843
  • 5
  • 39
  • 58
  • You need to add a variable that enables creation of a _unique_ id. Thus: `dat$id <- ave(dat$phrase, dat$phrase, FUN = seq_along)`. And then `idvar = c("phrase", "id")`. In the result, you can see the autogenerated rownames, now unique. – Henrik Jan 07 '19 at 22:24
  • @Henrik Thanks for your comment. Creating another id column is a good hint. I thought it could work without it. Compared to the elegant data.table solution I found that a little cumbersome but if that's the way it is, please turn your comment into an answer and I'll be happy to accept. – markus Jan 07 '19 at 22:36
  • Also, `reshape` works best when your variables are named in the pattern `sp1`,`sp2`... and `sc1`,`sc2`... etc (i.e. - name+NUMBER) - then you can do things like: `reshape(dat, idvar=c("phrase","id"), varying=2:7, sep="", direction="long")` and all the `varying` selections are done automatically. – thelatemail Jan 07 '19 at 22:41
  • @thelatemail Thanks! Very useful comment as well. `reshape` and I are in this love-hate relationship and even after years we don't really know each other. But we try to keep going. – markus Jan 07 '19 at 22:46
  • Or with hardcoded column positions in the two "groups": `reshape(dat, idvar = c("phrase", "id"), varying = list(sp = 2:4, sc = 5:7), direction = "long")`. Good'ol `reshape` rocks. – Henrik Jan 07 '19 at 22:46
  • @Henrik "Good'ol `reshape` rocks" - yeah, that's what I am thinking too. But you know, all these arguments. – markus Jan 07 '19 at 22:49
  • 1
    @markus I know, the arguments are the funniest part of it..;) But, I was so glad when I for once sake actually meant that it rocks ([here](https://stackoverflow.com/a/41884029/1851712)) and was able to write "This is one of the (rare) instances where I believe good'ol `base::reshape` is cleaner [than `data.table::melt`]". Note the "rare"... – Henrik Jan 07 '19 at 23:00

0 Answers0