0

I have a data set with a single identifier and five columns that repeat 18 times. I want to restructure the data into long format keeping the first five column headings as the column headings. Below is a sample with just two repeats:

structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"), 
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"), 
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged", 
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L, 
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L, 
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral", 
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))

I attempted using the melt and patterns functions, which appears to approximate my desired outcome without the desired column headings:

df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))

Here is the result:

structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)

When I tried to specify names with value.name() below I receive an error:

df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))

My desired result would look like this:

structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"), 
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L, 
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged", 
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))
user3594490
  • 1,949
  • 2
  • 20
  • 26

1 Answers1

1

It looks to me like you embarked on a difficult journey by using melt: this function is well named in the sense that trying to use it will probably melt your brain. Joke aside, the function melt has lots of underlying computations and its use could be inefficient if you have a large dataset.

I would instead solve the problem manually with rbindlist (from the excellent package data.table, which also ships with an optimized version of melt if you really want to use it), to manually concatenates groups of columns. This also preserves the column names:

> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
   Response.ID  Task   Freq Hours Value       Mood
1:           1 task1  Daily     3    10    Engaged
2:           2 task1  Daily     2     8    Neutral
3:           1 task2 Weekly     4    10 Optimistic
4:           2 task2 Weekly     4     6    Neutral

This works on your example: replace the indices 1:2 by the number of repetitions to make it work with the real dataset (so, lapply(1:18)).

Jealie
  • 6,157
  • 2
  • 33
  • 36