1

I would like to undo a reshape after converting a long data frame to wide-format by generating numbered versions of single variables. The challenge I'm facing is doing this when there are multiple key variables and multiple variables that need to get re-combined. I've tried using gather from tidyr to no avail. Take this example of long data:

 toy = data.frame(
   first_key = rep(c("A", "B", "C"), each = 6),
   second_key = rep(rep(c(1:2), each = 3), 3),
   colors = c("red", "yellow", "green", "blue", "purple", "beige"),
   days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"), 
   index = c(1:3)
 )

Which gives the following data.frame:

first_key second_key colors      days index
       A          1    red    Monday     1
       A          1 yellow   Tuesday     2
       A          1  green Wednesday     3
       A          2   blue  Thursday     1
       A          2 purple    Friday     2
       A          2  beige  Saturday     3
       B          1    red    Monday     1
       B          1 yellow   Tuesday     2
       B          1  green Wednesday     3
       B          2   blue  Thursday     1
       B          2 purple    Friday     2
       B          2  beige  Saturday     3
       C          1    red    Monday     1
       C          1 yellow   Tuesday     2
       C          1  green Wednesday     3
       C          2   blue  Thursday     1
       C          2 purple    Friday     2
       C          2  beige  Saturday     3

Reshaping it to a wide format with numbered versions of the variables looks like this:

toy_wide = reshape(toy, idvar = c("first_key", "second_key"),
           timevar = "index", direction = "wide", sep = "_")

And gives this wide format:

first_key second_key colors_1   days_1 colors_2  days_2 colors_3    days_3
       A          1      red   Monday   yellow Tuesday    green Wednesday
       A          2     blue Thursday   purple  Friday    beige  Saturday
       B          1      red   Monday   yellow Tuesday    green Wednesday
       B          2     blue Thursday   purple  Friday    beige  Saturday
       C          1      red   Monday   yellow Tuesday    green Wednesday
       C          2     blue Thursday   purple  Friday    beige  Saturday

But how do I get it back to the original format? I've tried the following but I get an error.

tidyr::gather(toy_wide, key = c("first_key", "second_key"), value = c("days", "colors"),
       colors_1:days_3, factor_key = TRUE)

Error: Invalid column specification

Nancy
  • 3,989
  • 5
  • 31
  • 49

2 Answers2

4

If you use reshape to go wide, use reshape to go long again:

reshape(toy_wide, idvar = c("first_key", "second_key"), timevar="index",
        varying=3:8, direction="long", sep="_")

#      first_key second_key index colors      days
#A.1.1         A          1     1    red    Monday
#A.2.1         A          2     1   blue  Thursday
# ...

If you specify the set of varying= variables (which can be a list of column values 3:8, column values to drop -(1:2), or column names as a character vector c("a","b") ) and a sep= then reshape will be able to guess the output variable names appropriately.

It's often helpful to do these kinds of reshapes in multiple steps to keep it clear and to automate things better:

ids <- c("first_key", "second_key")
reshape(toy_wide, idvar=ids, timevar="index",
        varying=setdiff(names(toy_wide), ids), direction="long", sep="_")
thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

Here is another option with melt from data.table which can take multiple measure patterns.

library(data.table)
melt(setDT(toy_wide), measure = patterns("^colors", "^days"), 
   value.name = c("colors", "days"), variable.name = "index")[order(first_key, second_key)]
akrun
  • 874,273
  • 37
  • 540
  • 662