Reshape with concatenation to have a single time series for each variable in R

Question

Would like to reshape a data set with the structure below in a way that would help me create a data set for time series analysis

The data set below is an example and I have multiple variables as columns and Multiple Brands as rows along with their respective timeperiods

Brand Period V1 V2 V3 A Week1 1 2 3 A Week2 1 2 3 A Week3 1 2 3 B Week1 1 2 3 B Week2 1 2 3 B Week3 1 2 3 C Week1 1 2 3 C Week2 1 2 3 C Week3 1 2 3

the data set would look like this :

Period A_V1 A_V2 A_V3 B_V1 B_V2 B_V3 C_V1 C_V2 C_V3 Week1 Week2 Week3

Was wondering if there is some function in the reshape package or anyother package I can use

Look at `dcast` from `data.table` i.e. `dcast(setDT(df1), Period ~Brand, value.var = c("V1", "V2", "V3"))` — akrun, Nov 22 '17 at 09:43
hey looking at a solution where I donot need to type in the variable names for each case, like "V1", "V2" etc — user36176, Nov 22 '17 at 09:54

G. Grothendieck · Answer 1 · 2017-11-22T12:53:17.467

The basic operation can be done in one read.zoo call which will:

read in the brands.dat file (defined reproducibly in the Note at the end) -- if you have a data frame as input instead then use the commented out line below instead of the uncommented line,
split the data by Brand.

The result is a zoo series z. The series can be manipulated directly in that form or it can be converted to a data frame using fortify.zoo(z) or a ts series by converging the index to numeric (as shown later) and then using as.ts(z).

library(zoo)

# z <- read.zoo(brands, index = 2, split = 1, FUN = as.character, header = TRUE)
z <- read.zoo("brands.dat", index = 2, split = 1, FUN = as.character, header = TRUE)

giving:

      V1.A V2.A V3.A V1.B V2.B V3.B V1.C V2.C V3.C
Week1    1    2    3    1    2    3    1    2    3
Week2    1    2    3    1    2    3    1    2    3
Week3    1    2    3    1    2    3    1    2    3

If you prefer column names exactly in the form shown in the question add this:

colnames(z) <- sub("(\\w+)[.](\\w+)", "\\2_\\1", colnames(z))

If you prefer a numeric time index or want to convert it to a ts series (which would require such) then add this:

time(z) <- 1:nrow(z)

or this:

time(z) <- as.numeric(gsub("\\D", "", time(z))

Note: This generates the input file:

Lines <- "
Brand   Period  V1  V2  V3
A      Week1    1   2   3
A      Week2    1   2   3
A      Week3    1   2   3
B      Week1    1   2   3
B      Week2    1   2   3
B      Week3    1   2   3
C      Week1    1   2   3
C      Week2    1   2   3
C      Week3    1   2   3"
cat(Lines, file = "brands.dat")

or if your starting point is a data frame then:

brands <- read.table(text = Lines, header = TRUE)

This works as as well, but I find @akrun solution, more feasible at the moment. Btw could you explain this line of code, if you dont mind? colnames(z) <- sub("(\\w+)[.](\\w+)", "\\2_\\1", colnames(z)) — user36176, Nov 23 '17 at 09:57
Match one or more word characters followed by a dot followed by one or more word characters. The parentheses define capture groups. The replacement string then specifies the match to the second capture group, an underscore and the match to the first capture group. — G. Grothendieck, Nov 23 '17 at 13:05

score 1 · Accepted Answer · answered Nov 23 '17 at 06:18

We can use dcast from data.table that takes multiple value.var columns

library(data.table)
dcast(setDT(df1), Period ~ Brand, value.var =names(df1)[3:5])
#   Period V1_A V1_B V1_C V2_A V2_B V2_C V3_A V3_B V3_C
#1:  Week1    1    1    1    2    2    2    3    3    3
#2:  Week2    1    1    1    2    2    2    3    3    3
#3:  Week3    1    1    1    2    2    2    3    3    3

score 0 · Answer 3 · answered Nov 22 '17 at 10:10

If you're used to the tidyverse, you could use a combination of gather and spread from tidyr for this (similar to this answer):

Brand <- c(rep("A", 3), rep("B", 3), rep("C", 3))
Period <- c(rep(c("Week1", "Week2", "Week3"), 3))
V1 <- c(rep(1, 9))
V2 <- c(rep(2, 9))
V3 <- c(rep(3, 9))
df <- data.frame(cbind(Brand, Period, V1, V2, V3))

df %>% 
  gather(vars, value, -Brand, -Period) %>% 
  mutate(observation = paste(Brand, vars, sep="_")) %>% 
  select(-Brand, -vars) %>% 
  spread(observation, value)

Results in:

  Period A_V1 A_V2 A_V3 B_V1 B_V2 B_V3 C_V1 C_V2 C_V3
1 Week1     1    2    3    1    2    3    1    2    3
2 Week2     1    2    3    1    2    3    1    2    3
3 Week3     1    2    3    1    2    3    1    2    3

Reshape with concatenation to have a single time series for each variable in R

3 Answers3