melting df into 3 columns

Question

I am trying to melt the following dataset into 3 columns - 'hour', 'variable', and 'cluster'

> head(kpitdur)
       hr_0  hr_1030  hr_1130   hr_160   hr_180   hr_190   hr_200   hr_630   hr_830
1  79.08333 63.06667 63.06667 63.06667 63.06667 63.06667 63.06667 65.73333 63.06667
2  71.45000 51.80000 51.80000 51.80000 51.80000 51.80000 51.80000 71.45000 51.80000
3  86.96667 56.91667 56.91667 56.91667 56.91667 56.91667 56.91667 69.00000 56.91667
4  91.53333 77.38333 61.83333 77.38333 77.38333 77.38333 77.38333 77.38333 77.38333
5  91.83333 78.10000 78.10000 78.10000 78.10000 78.10000 78.10000 78.10000 78.10000
6 111.41667 65.75000 65.75000 65.75000 65.75000 65.75000 65.75000 80.63333 65.75000
    hr_930 cluster
1 63.06667       2
2 51.80000       2
3 56.91667       2
4 77.38333       1
5 78.10000       1
6 65.75000       1

However, when I use the following formula in melt, I am given only two columns, and am not sure how to work around this problem. I have tried inputing different variable names in value.name but this does not work. How could one melt this dataset into three separate columns ?

> melted <- melt(kpitdur, value.name = "cluster")
No id variables; using all as measure variables

> head(melted)
  variable   cluster
1     hr_0  79.08333
2     hr_0  71.45000
3     hr_0  86.96667
4     hr_0  91.53333
5     hr_0  91.83333
6     hr_0 111.41667

> tail(melted)
      variable cluster
11699  cluster       1
11700  cluster       1
11701  cluster       1
11702  cluster       2
11703  cluster       1
11704  cluster       1

Here is a sample of the data:

> dput(df)
structure(list(hr_0 = c(79.0833333333333, 71.45, 86.9666666666667, 
91.5333333333333, 91.8333333333333, 111.416666666667), hr_1030 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_1130 = c(63.0666666666667, 
51.8, 56.9166666666667, 61.8333333333333, 78.1, 65.75), hr_160 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_180 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_190 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_200 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_630 = c(65.7333333333333, 
71.45, 69, 77.3833333333333, 78.1, 80.6333333333333), hr_830 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), hr_930 = c(63.0666666666667, 
51.8, 56.9166666666667, 77.3833333333333, 78.1, 65.75), cluster = c(2L, 
2L, 2L, 1L, 1L, 1L)), .Names = c("hr_0", "hr_1030", "hr_1130", 
"hr_160", "hr_180", "hr_190", "hr_200", "hr_630", "hr_830", "hr_930", 
"cluster"), row.names = c(NA, 6L), class = "data.frame")

`No id variables; using all as measure variables` was a clue — rawr, Jun 22 '16 at 12:09
gets you 90% of the way there: `reshape(kpitdur, varying = names(kpitdur)[1:10], direction = "long", v.names = "hour")` ; just change `time` to the variable names — Hack-R, Jun 22 '16 at 12:13

score 0 · Answer 1 · answered Jun 22 '16 at 12:05

You could do

reshape(
  data = kpitdur, 
  varying = -11, 
  direction = "long", 
  sep="_", 
  timevar = "variable"
)[-4]
#        cluster variable        hr
# 1.0          2        0  79.08333
# 2.0          2        0  71.45000
# 3.0          2        0  86.96667
# 4.0          1        0  91.53333
# 5.0          1        0  91.83333
# 6.0          1        0 111.41667
# 1.1030       2     1030  63.06667
# 2.1030       2     1030  51.80000
# 3.1030       2     1030  56.91667
# 4.1030       1     1030  77.38333
# ...

score 0 · Answer 2 · answered Jun 22 '16 at 12:10

You could do it also with tidyr package

library(tidyr)

data_long <- gather(df, variable, cluster,  hr_0:hr_930)
names(data_long)[3] <- "value"

> head(data_long, 5)
  cluster variable    value
1       2     hr_0 79.08333
2       2     hr_0 71.45000
3       2     hr_0 86.96667
4       1     hr_0 91.53333
5       1     hr_0 91.83333

score 0 · Answer 3 · answered Jun 22 '16 at 12:10

# Option 1 using reshape2
library(reshape2)

df %>% 
    melt(value.name = "cluster")  # incorrect original result

df %>% 
    melt(id.vars = "cluster", value.name = "variable")

Or...

# Option 2 using tidyr
library(tidyr)

df %>% 
    gather(hour, variable, -cluster)

melting df into 3 columns

3 Answers3