Consider the following dataset which is taken from this question: Going from wide to long w/ coupled-columns: Is there a more R way to do this (i.e. - without using a for loop)?
(dput
at the end of the question)
phrase wo1sp wo2sp wo3sp wo1sc wo2sc wo3sc
1 hello dan mark todd 10 5 4
2 hello mark dan chris 8 9 4
3 goodbye mark dan kev 2 4 10
4 what kev dan mark 4 5 5
The goal is to reshape the data from wide to long taking into account that there is some pattern in the column names. The expected output is
phrase time sp sc
1 hello 1 dan 10
2 hello 1 mark 8
3 goodbye 1 mark 2
4 what 1 kev 4
5 hello 2 mark 5
6 hello 2 dan 9
7 goodbye 2 dan 4
8 what 2 dan 5
9 hello 3 todd 4
10 hello 3 chris 4
11 goodbye 3 kev 10
12 what 3 mark 5
@docendodiscimus provided a sound solution using melt
from data.table
but I'd like to use reshape()
from the stats
library for the sake of practice.
This function is certainly quite powerful but I almost always run into a problem with one of its arguments, so this time it's the new.row.names
argument which I rarely used before.
I tried
reshape(
dat,
idvar = "phrase",
varying = list(
"sp" = grep("sp$", names(dat)),
"sc" = grep("sc$", names(dat))
),
direction = "long",
v.names = c("sp", "sc") # name of cols in long format
)
This returns the error
error in
row.names<-.data.frame
(*tmp*
, value = paste(ids, times[i], : duplicate 'row.names' are not allowedIn addition: Warning message: non-unique value when setting 'row.names': ‘hello.1’
Reading the error message I found the 'solution' to be the new.row.names
argument which I set to be 1:12
, see below. (I cheated here because I looked how many rows were returned by the data.table
solution.)
My question is what would be the generic solution to this problem?
# works!
reshape(
dat,
idvar = "phrase",
varying = list(
"sp" = grep("sp$", names(dat)),
"sc" = grep("sc$", names(dat))
),
direction = "long",
v.names = c("sp", "sc"),
new.row.names = 1:12 # 1:10000 would also work
)
data
dat <- structure(list(phrase = c("hello", "hello", "goodbye", "what"
), wo1sp = c("dan", "mark", "mark", "kev"), wo2sp = c("mark",
"dan", "dan", "dan"), wo3sp = c("todd", "chris", "kev", "mark"
), wo1sc = c(10L, 8L, 2L, 4L), wo2sc = c(5L, 9L, 4L, 5L), wo3sc = c(4L,
4L, 10L, 5L)), .Names = c("phrase", "wo1sp", "wo2sp", "wo3sp",
"wo1sc", "wo2sc", "wo3sc"), class = "data.frame", row.names = c(NA,
-4L))