R / tidyr/dplyr - transpose repeated rows by key into columns with new variable names

Question

Edit: This is a similar question to this one but I'm explicitly after a tidyr/dplyr approach.

I'm curious if there's a neat "tidyr/dplyr" way of doing this kind of transposing?

I've got different (by ID) observations (v2,v2,v3) of the same phenomenon (key) currently in "long" format, and for presentation, I'd like a wide format with one phenomenon (key) per row, but each set of observations (ID,v1,v2,v3) in repeated sets of columns with appropriately incremented variable names

In this case, I know that there are only going to be 2 IDs so I split it into two frames of tables and joined them.

I'd like any pointers on a general tidyr way of turning:

key ID      v1      v2      v3
32  blue    8.550   0.782   78.281
32  green   9.200   1.680   95.354
22  orange  6.100   -0.143  44.320
22  pink    6.500   0.672   74.920
100 green   4.500   -0.460  32.280
100 blue    8.000   0.506   69.372

Into:

key IDa     v1       v2     v3      IDb     v1b     v2b     v3b
32  blue    8.550    0.782  78.281  green   9.200   1.680   95.354
22  orange  6.100   -0.143  44.320  pink    6.500   0.672   74.920
100 green   4.500   -0.460  32.280  blue    8.000   0.506   69.372

Thanks!

score 2 · Accepted Answer · answered Oct 06 '17 at 15:10

2

You can create an id column for each key and then use data.table::dcast which is capable of pivoting multiple columns:

df %>% 
    group_by(key) %>% 
    mutate(n = row_number()) %>% 
    {data.table::dcast(data = setDT(.), key ~ n, value.var = c('ID', 'v1', 'v2', 'v3'))}

#   key   ID_1  ID_2 v1_1 v1_2   v2_1  v2_2   v3_1   v3_2
#1:  22 orange  pink 6.10  6.5 -0.143 0.672 44.320 74.920
#2:  32   blue green 8.55  9.2  0.782 1.680 78.281 95.354
#3: 100  green  blue 4.50  8.0 -0.460 0.506 32.280 69.372

answered Oct 06 '17 at 15:10

Psidom

209,562
33
339
356

Hadn't come across data.table/dcast before - very useful! I marked this one as "answered" as it's a more compact solution than the tidyr/dplyr one below, even though I initially found that one easier to follow. – Bantamug Oct 09 '17 at 16:22

score 2 · Answer 2 · answered Oct 06 '17 at 16:03

This solution relies on tidyr and dplyr only. The key here is to use tidyr::unite to create the key variable to spread with.

library(dplyr)
library(tidyr)

df %>% 
  group_by(key) %>% 
  mutate(suffix = letters[1:n()]) %>%
  gather(var, val, -c(key, suffix)) %>%
  unite(var_group, var, suffix, sep = "") %>%
  spread(var_group, val) %>%
  select(key, ends_with("a"), ends_with("b"))
#> # A tibble: 3 x 9
#> # Groups:   key [3]
#>     key    IDa   v1a    v2a    v3a   IDb   v1b   v2b    v3b
#> * <int>  <chr> <chr>  <chr>  <chr> <chr> <chr> <chr>  <chr>
#> 1    22 orange   6.1 -0.143  44.32  pink   6.5 0.672  74.92
#> 2    32   blue  8.55  0.782 78.281 green   9.2  1.68 95.354
#> 3   100  green   4.5  -0.46  32.28  blue     8 0.506 69.372

score 0 · Answer 3 · answered Oct 06 '17 at 15:34

Consider a base R solution with merge using the reverse duplicate join (a borrowed SQL method). However, you need to create a helper column for row number for the reverse duplicate subset and to yield distinct IDa values. Below transform is used to remove the helper row columns.

txt = "key ID      v1      v2      v3
32  blue    8.550   0.782   78.281
32  green   9.200   1.680   95.354
22  orange  6.100   -0.143  44.320
22  pink    6.500   0.672   74.920
100 green   4.500   -0.460  32.280
100 blue    8.000   0.506   69.372"

df <- read.table(text=txt, header=TRUE, stringsAsFactors = FALSE)
df$row <- row.names(df)

mdf <- transform(subset(merge(df, df, by="key", suffixes=c("a", "b")), rowa < rowb), rowa=NULL, rowb=NULL)

mdf
#    key    IDa  v1a    v2a    v3a   IDb v1b   v2b    v3b
# 2   22 orange 6.10 -0.143 44.320  pink 6.5 0.672 74.920
# 6   32   blue 8.55  0.782 78.281 green 9.2 1.680 95.354
# 10 100  green 4.50 -0.460 32.280  blue 8.0 0.506 69.372

R / tidyr/dplyr - transpose repeated rows by key into columns with new variable names

3 Answers3