1

I am trying to reshape my dataset using dplyr gather and spread functions to move from this data shape:

library(tidyverse)
# test data set
df = tibble(factor = c("a","a","b","b"),
           factor2 = c("d1","d2","d1","d2"),
           value1 = round(rnorm(4),1)*10,
           value2 = round(runif(4),2)*100)

that looks like this:

# A tibble: 4 x 4
  factor factor2 value1 value2
  <chr>  <chr>    <dbl>  <dbl>
1 a      d1           4     97
2 a      d2         -21     10
4 b      d1          -2     65
5 b      d2         -14     93

to something that will look like this:

factor    d1val1   d1val2  d2val1  d2val2
a          4        97      -21     10
b         -2        65      -14     93

Ideally I would like to achieve this with dplyr spread / gather.

Uwe
  • 41,420
  • 11
  • 90
  • 134
Marek
  • 419
  • 1
  • 5
  • 13
  • 1
    Related [Convert data from long format to wide format with multiple measure columns](https://stackoverflow.com/questions/10589693/convert-data-from-long-format-to-wide-format-with-multiple-measure-columns) – markus Mar 14 '19 at 07:28
  • 1
    If you use random numbers to generate sample data, like calling `rnorm()`, please, call `set.seed(1)` beforehand to make sure that the dataset can be reproduced. Thank you. – Uwe Mar 14 '19 at 07:30

3 Answers3

3

For the sake of completeness, the implementation of dcast() is able to reshape multiple variables simultaneously:

library(data.table)
dcast(setDT(df), factor ~ factor2, value.var = c("value1", "value2"))
   factor value1_d1 value1_d2 value2_d1 value2_d2
1:      a         4       -21        97        10
2:      b        -2       -14        65        93
Uwe
  • 41,420
  • 11
  • 90
  • 134
2

An option would be to gather the 'value' columns to 'long' format, then unite the 'factor2' and 'key' column to create a single column, and spread it back to 'wide' format

library(dplyr)
library(tidyr)
df %>% 
  gather(key, val, value1:value2) %>% 
  unite(dcols, factor2, key, sep = "") %>%
  spread(dcols, val)

As the column types are the same, it is okay to do this with gather and spread

akrun
  • 874,273
  • 37
  • 540
  • 662
2

Another tidyverse possibility could be:

df %>% 
 gather(var, val, -c(factor, factor2)) %>%
 mutate(var = paste0(factor2, var)) %>%
 select(-factor2) %>%
 spread(var, val) 

  factor d1value1 d1value2 d2value1 d2value2
  <chr>     <dbl>    <dbl>    <dbl>    <dbl>
1 a            -4       85       -4       65
2 b             4       39       -1       20

It, first, transforms the data from wide to long format, excluding the variables "factor" and "factor2". Second, it combines together the values from "factor2" and the names of variables. Finally, it removes the redundant variables and returns the data to the desired format.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67