3

I have the following data

dat <- data.frame(id         = c("A", "B", "C"),
                  Q1r1_pepsi = c(1,0,1),
                  Q1r1_cola  = c(0,0,1),
                  Q1r2_pepsi = c(1,1,1),
                  Q1r2_cola  = c(0,1,1),
                  stringsAsFactors = FALSE)

where Q1r1 and Q1r2 are rating questions in a survey and pepsi and cola are the brands being rated. So I have two ratings (r1 and r2) for two brands (pepsi, cola):

id      Q1r1_c1    Q1r1_c2    Q1r2_c1    Q1r2_c2
"A"     1          0          1          0
"B"     0          0          1          1
"C"     1          1          1          1

(Side question: how do I format a SO post so that it correctly contains the nicely formatted output that I would get when calling dat in the R Console?)

To analyze the data I need to reshape (pivot) the data so that rows indicate unique rating-brand pairs. Thus, the expected outcome would be:

id      brand   Q1r1    Q1r2
"A"     "pepsi" 1       1
"A"     "cola"  0       0
"B"     "pepsi" 0       1
"B"     "cola"  0       1
"C"     "pepsi" 1       1
"C"     "cola"  1       1

Currently, I always do a combination of pivot_longer and pivot_wider, but I was hoping that I can directly get this result by pivoting_longer without doing the intermediate step:

library(tidyverse)

dat_long <- dat %>%
  pivot_longer(cols = starts_with("Q1")) %>%
  separate(name, into = c("item", "brand"), remove = FALSE)

dat_wide <- dat_long %>%
  pivot_wider(id_cols = c(id, brand),
              names_from = item,
              values_from = value)

With this current example it's still ok to do this intermediate step, but it gets tiresome in other less clean examples, e.g. suppose my columns weren't named in a nice structure with Q1r1_c1, Q1r1_c2, Q1r2_c1, Q1r2_c2, but instead would be Q4, Q5, Q8r1, Q8r2 where the map would be between Q4 and Q8r1, and Q5/Q8r2, respectively.

akrun
  • 874,273
  • 37
  • 540
  • 662
deschen
  • 10,012
  • 3
  • 27
  • 50

3 Answers3

5

You can use :

tidyr::pivot_longer(dat, cols = -id, 
                   names_to = c('.value', 'brand'), 
                   names_sep = "_")


#  id    brand  Q1r1  Q1r2
#  <chr> <chr> <dbl> <dbl>
#1 A     pepsi     1     1
#2 A     cola      0     0
#3 B     pepsi     0     1
#4 B     cola      0     1
#5 C     pepsi     1     1
#6 C     cola      1     1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks. Works like a charm and is much faster than my solution (which will play a role in my real data sets where I have hundreds of rating-brand combinations and thousands of rows). However, the code works nicely because of the structured naming. Any idea how to extend this to cases where there is no clear pattern? Like suppose I have the columns `Q4, Q5, Q8r1, Q8r2`, where Q4 and Q8r1 are pepsi ratings and Q5/Q8r2 are cola ratings and the expected output would be two columns: `Q4_Q5` and `Q8`? (naming of columns can differ if that makes things easier). – deschen Apr 22 '20 at 11:48
  • I know there is this `build_longer_spec` function for these tailored cases, but I didn't get it working yet. – deschen Apr 22 '20 at 11:49
  • I am not sure if you can apply this directly on such dataset. Probably, you can rename the columns with some pattern and then use `pivot_longer` like above. – Ronak Shah Apr 22 '20 at 12:39
  • I was thinking about that, too. First rename columns, apply the easy pivot_longer, then rename again (if necessary). However, with your code above I played around a bit with the `build_longer_spec` function. And now it works. So I replaced the `pivot_longer` from you by the `build_longer_spec` function, checked the structure and manually replaced the content by this new Q4Q5 and Q8 pattern and afterwards called the pivot_longer with that spec object. Works nicely, although I'm not entirely sure how generalizable it is and if it wouldn't indeed be easier with the renaming approach. – deschen Apr 22 '20 at 12:48
  • Can you add that as an answer? It would be interesting to see how you use `build_longer_spec` in a scenario when there is no clear pattern in column names? – Ronak Shah Apr 22 '20 at 13:33
  • Done. Another unrelated question, how do you get this nice formatted output here into the post, i.e. the grey part starting with the #s? – deschen Apr 22 '20 at 14:44
  • Ohh...that is just the same way as you add code, just start with `#`. You can look at it by clicking on edit under my post. – Ronak Shah Apr 22 '20 at 15:04
1

Following @Ronak Shah's suggestion, I'm pasting my code here in case my column names are less structured. I'm building up this code on the dat data I specified in my initial post.

names(dat) <- c("id", "Q4", "Q5", "Q8r1", "Q8r2")

spec <- data.frame(.name  = names(dat)[-1],
                   .value = c("Q4Q5", "Q4Q5", "Q8", "Q8"),
                   brand  = rep(c("pepsi", "cola"), 2),
                   stringsAsFactors = FALSE)

dat_long <- pivot_longer_spec(dat, spec)

This gives essentially the same result as for the data with my structured names, just that the names are different now.

Again, I'm not entirely sure how generalizable this approach is, but it worked in my case.

deschen
  • 10,012
  • 3
  • 27
  • 50
0

We can use gather/spread

library(tidyr)
library(dplyr)
dat %>%
    gather(brand, value, -id) %>%
    separate(brand, into = c('name', 'brand')) %>% 
    spread(name, value)
#  id brand Q1r1 Q1r2
#1  A  cola    0    0
#2  A pepsi    1    1
#3  B  cola    0    1
#4  B pepsi    0    1
#5  C  cola    1    1
#6  C pepsi    1    1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks, I also thought about the good old gather/spread. However, since these are retired functions I don't want to actively rely on them anymore. They might still exist for quite some time, but still prefer the same functions in all my codes. – deschen Apr 22 '20 at 21:21