tidyr pivot_longer: handling multiple observations and values per row

Question

I have an excel file I need to read that has multiple observations and values per row, with complicated names. It looks something like this when you load in:

library(tidyverse)
library(janitor)

# An input table read from xlsx, with a format similar to this
# An input table read from xlsx, with a format similar to this
input_table <- tribble(~"product" , 
                       ~"Price Store 1 ($1000/unit)",
                       ~"Quantity in Store 1 (units)",
                       ~"Price Store 2 ($1000/unit)",
                       ~"Quantity in Store 2 (units)",
                       'product a', 10, 100, 20, 70,
                       'product b', 30, 10, 35, 10)

I want to use some form of gather/pivot_longer to make it tidy, and have an output that looks like this:

# Desired output
output_table <- tribble(~'product',~'store',~'price',~'quantity',
                        'product a', 1, 10, 100,
                        'product a', 2, 20, 70,
                        'product b', 1, 30, 10,
                        'product b', 2, 35, 10)

Is there an easy way to get there using pivot_longer? Extracting the key number (in this case, store) would probably need some complex regex that I don't know how to create.

Ronak Shah · Accepted Answer · 2019-12-11T00:13:31.217

3

Yes, we can do

tidyr::pivot_longer(input_table, 
                   cols = -product, 
                   names_to = c(".value", "Store"),
                   names_pattern =  "(\\w+).*?(\\d)")

#  product   Store Price Quantity
#  <chr>     <chr> <dbl>    <dbl>
#1 product a 1        10      100
#2 product a 2        20       70
#3 product b 1        30       10
#4 product b 2        35       10

We get the column names (Price or Quantity) along with store number using names_pattern. The first word (\\w+) is the column name whereas first digit coming after it (\\d) is considered as store number.

edited Dec 11 '19 at 00:13

answered Dec 11 '19 at 00:03

Ronak Shah

377,200
20
156
213

That looks great. Can you explain the regex? – Leo Barlach Dec 11 '19 at 00:09
1

@LeoBarlach I added some explanation for the regex. – Ronak Shah Dec 11 '19 at 00:14

akrun · Answer 2 · 2019-12-11T00:25:37.870

We can use the names_pattern in pivot_longer to match one or more letters followed by characters that are not a digit and capture the digit

library(tidyr)
pivot_longer(input_table, cols = -product, 
               names_to = c(".value", "Store"),
                names_pattern =  "([A-Za-z]+)[^0-9]+([0-9])")
# A tibble: 4 x 4
#  product   Store Price Quantity
#  <chr>     <chr> <dbl>    <dbl>
#1 product a 1        10      100
#2 product a 2        20       70
#3 product b 1        30       10
#4 product b 2        35       10

tidyr pivot_longer: handling multiple observations and values per row

2 Answers2