1

I have a dataframe with columns id, price1, price2, price3,prob1,prob2,prob3 I want to covnert the wide format and price, prob columns into long format

library(dplyr)
library(data.table)


a <- data.table("id" = c(1,2,4),
                "price1"=c(1.2,2.44,5.6),
                "price2"=c(7.6,8,65),
                "price3"=c(1.2,4.5,7.8),
                "prob1"=c(0.1,0.3,0.5),
                "prob2"=c(0.3,0.35,0.75),
                "prob3"=c(0.18,0.31,0.58))

> a
  id price1 price2 price3 prob1 prob2 prob3
1  1   1.20    7.6    1.2   0.1  0.30  0.18
2  2   2.44    8.0    4.5   0.3  0.35  0.31
3  4   5.60   65.0    7.8   0.5  0.75  0.58

I want to transform the table a as

b <- data.table("id"=c(1,1,1,2,2,2,3,3,3),
                "order"=c(1,2,3,1,2,3,1,2,3),
                "price"=c(1.20,7.6,1.2,2.44,8.0,4.5,5.60,65.0,7.8),
                "prob"=c(0.1,0.30,0.18,0.3,0.35,0.31,0.5,0.75,0.58))

> b
   id order price prob
1:  1     1  1.20 0.10
2:  1     2  7.60 0.30
3:  1     3  1.20 0.18
4:  2     1  2.44 0.30
5:  2     2  8.00 0.35
6:  2     3  4.50 0.31
7:  3     1  5.60 0.50
8:  3     2 65.00 0.75
9:  3     3  7.80 0.58

here order is indicating the sequence number of price and prob values, else it would get shuffled. I want to get this transformation in sparklyr

Yashwanth
  • 69
  • 7

1 Answers1

1

You can use pivot_longer specifying names_pattern.

tidyr::pivot_longer(a, cols = -id, 
                    names_to = c('.value', 'order'), 
                    names_pattern = '(.*?)(\\d+)')

# A tibble: 9 x 4
#     id order price  prob
#  <dbl> <chr> <dbl> <dbl>
#1     1 1      1.2  0.1  
#2     1 2      7.6  0.3  
#3     1 3      1.2  0.18 
#4     2 1      2.44 0.3  
#5     2 2      8    0.35 
#6     2 3      4.5  0.31 
#7     4 1      5.6  0.5  
#8     4 2     65    0.75 
#9     4 3      7.8  0.580
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213