Using base R to reformat wide data to long data more efficiently

Question

Other questions on reshape() helped me reformat my data (see below), but I'm wondering if there's a way to achieve what I did in a simpler way (less code) using base R, no packages. I have example data that looks like this:

period store_id product price1 price2 price3 quantity1 quantity2 quantity3
201801        1     1       11      5      6       100       200       300
201802        1     1       12      6      6       100       200       300
201803        1     1       13      7      6       100       200       300
201804        1     1       14      8      6       100       200       300
201805        1     1       15      9      6       100       200       300
201806        2     2       16     10      6       100       200       300
201807        2     2       17     11      6       100       200       300
201808        2     2       18     12      6       100       200       300
201809        2     2       19     13      6       100       200       300
201810        2     2       20     14      6       100       200       300

And I'd like for it to look something like this (every period has 3 rows, one column for price and one column for quantity):

period store_id  product quantity type   price 
201801        1     1     100    price1    11 
201801        1     1     200    price2     5 
201801        1     1     300    price3     6 
201802        1     1     100    price1    12 
201802        1     1     200    price2     6 
201802        1     1     300    price3     6 
201803        1     1     100    price1    13 
201803        1     1     200    price2     7 
201803        1     1     300    price3     6 
201804        1     1     100    price1    14 
and so on...

I was able to reformat my data by first splitting the data into two data frames, one for prices and one for quantities, and then merging them back together afterwards. Is there a simpler way to use reshape() for my example in less steps/code than what I did below?

Any help is appreciated, thanks.

CODE TO PRODUCE INITIAL DATA SET

period <- 201801:201810
df <- data.frame(period)
df$store_id <- c(1,1,1,1,1,2,2,2,2,2)
df$product <- c(1,1,1,1,1,2,2,2,2,2)
df$price1 <- 11:20
df$price2 <- 5:14
df$price3 <- df_test$price1-df_test$price2
df$quantity1 <- 100
df$quantity2 <- 200
df$quantity3 <- 300
df

MY INEFFICIENT CODE / SOLUTION

# prices
df_prices <- df[,!names(df)%in%c("quantity1","quantity2","quantity3")]

df_prices <- reshape(df_prices,               
                     direction = "long",     
                     idvar = c("period", "product", "store_id"), 
                     timevar = "type",           
                     varying = c("price1","price2","price3"),  
                     v.names = "price",     
                     times = c("price1","price2","price3"))

# quantities
df_quantities <- df[,!names(df)%in%c("price1","price2","price3")]

df_quantities <- reshape(df_quantities,               
                         direction = "long",     
                         idvar = c("period", "product", "store_id"), 
                         timevar = "type",           
                         varying = c("quantity1","quantity2","quantity3"),  
                         v.names = "quantity",     
                         times = c("quantity1","quantity2","quantity3"))

# make variables consistent to merge
df_quantities$type[df_quantities$type=="quantity1"] <- "price1"
df_quantities$type[df_quantities$type=="quantity2"] <- "price2"
df_quantities$type[df_quantities$type=="quantity3"] <- "price3"

# merge both
df <- merge(df_prices, df_quantities, by=c("period","product","store_id","type"))

Why is type having price instead of quantity? I believe it should just be `1,2,3` instead of price1,2,3`.. — Onyambu, May 30 '22 at 17:50
Your varying can be a list i.e `reshape(df, idvar = c('period', 'store_id'), direction = "long", varying = list(3:5, 6:8))` — akrun, May 30 '22 at 17:50
@akun Thanks so much for the tip on list in varying, that works perfectly. I mistakenly left out a variable before which now broke your solution (should now be list(4:6, 7:9)), but your line works perfectly. Thanks! — littleisrael, May 30 '22 at 17:57

score 1 · Accepted Answer · answered May 30 '22 at 17:58

You can specify varying as 4:9 ie columns 4 to column 9. And specify the sep='' ie nothing between time and 1 you have time1. if you had time_1 then sep = '_':

reshape(df, 4:9, dir = 'long', sep="")

     period store_id product time price quantity id
1.1  201801        1       1    1    11      100  1
2.1  201802        1       1    1    12      100  2
3.1  201803        1       1    1    13      100  3
4.1  201804        1       1    1    14      100  4
5.1  201805        1       1    1    15      100  5
6.1  201806        2       2    1    16      100  6
7.1  201807        2       2    1    17      100  7
8.1  201808        2       2    1    18      100  8
9.1  201809        2       2    1    19      100  9
10.1 201810        2       2    1    20      100 10
1.2  201801        1       1    2     5      200  1
2.2  201802        1       1    2     6      200  2
3.2  201803        1       1    2     7      200  3
4.2  201804        1       1    2     8      200  4
5.2  201805        1       1    2     9      200  5
6.2  201806        2       2    2    10      200  6
7.2  201807        2       2    2    11      200  7
8.2  201808        2       2    2    12      200  8
9.2  201809        2       2    2    13      200  9
10.2 201810        2       2    2    14      200 10
1.3  201801        1       1    3     6      300  1
2.3  201802        1       1    3     6      300  2
3.3  201803        1       1    3     6      300  3
4.3  201804        1       1    3     6      300  4
5.3  201805        1       1    3     6      300  5
6.3  201806        2       2    3     6      300  6
7.3  201807        2       2    3     6      300  7
8.3  201808        2       2    3     6      300  8
9.3  201809        2       2    3     6      300  9
10.3 201810        2       2    3     6      300 10

very slick solution, thanks! – littleisrael May 30 '22 at 18:01 — littleisrael, May 30 '22 at 18:01

Using base R to reformat wide data to long data more efficiently

1 Answers1