I have a data frame like this:
structure(list(one = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), two = c(2, 4, 7, 3), x.1 = c("x1a",
"x1b", "x1c", "x1d"), x.2 = c("x2a", "x2b", "x2c", "x2d"), x.3 = c("x3a",
"x3b", "x3c", "x3d"), y.1 = c(NA, "y1b", "y1c", NA), y.2 = c(NA,
"y2b", "y2c", NA), y.3 = c(NA, "y3b", "y3c", NA)), .Names = c("one",
"two", "x.1", "x.2", "x.3", "y.1", "y.2", "y.3"), row.names = c(NA,
-4L), class = "data.frame")
As you can see, the observations per event a, b, c, and d (variable "one") are stored as columns, where x and y define separate observations and 1, 2 and 3 define the variables. Variable "two" does not have a meaning here.
I like to reshape this data frame to have it tidy in the form that each observation has it's own row and each variable it's own column.
The final data frame should look like this:
structure(list(one = structure(c(1L, 2L, 2L, 3L, 3L, 4L), .Label = c("a",
"b", "c", "d"), class = "factor"), two = c(2, 4, 2, 7, 5, 3),
var1 = c("x1a", "x1b", "y1b", "x1c", "y1c", "x1d"), var2 = c("x2a",
"x2b", "y2b", "x2c", "y2c", "x2d"), var3 = c("x3a", "x3b",
"y3b", "x3c", "y3c", "x3d")), .Names = c("one", "two", "var1",
"var2", "var3"), row.names = c(1L, 2L, 5L, 3L, 6L, 4L), class = "data.frame")
I am slightly familiar with what the cast and melt function from the reshape packages do, but was not able yet to figure out a way to reshape the DF in a smart way. For now the following provides the sate that I have gotten to:
df.between <- melt(df.in, id.vars=c("one", "two"))
df.between$variable <- gsub("x.|y.", "", df.between$variable)
Now the "variable" column does correctly identify the variable (1, 2 or 3). However, I was not able to cast this into the required form and this solution does not seem to be useful for larger sets of data due to the use of grepl
.
Happy to get a nudge into the right direction here.