1

I've the following data:

mydata <- data.frame(VAR1 = c("A",  "A",    "A",    "A",    "A",    "A",    "B",    "B",    "B",            
                              "B",  "B",    "B",    "C",    "C",    "C",    "C",    "C",    "C",            
                              "D",  "D",    "D",    "D",    "D",    "D",    "E",    "E",    "E",    "E",    "E",    "E"), 
                     VAR2 = c("A1", "A1",   "A1",   "A2",   "A2",   "A2",   "B1",   "B1",   "B1",           
                              "B2", "B2",   "B2",   "C1",   "C1",   "C1",   "C2",   "C2",   "C2",           
                              "D1", "D1",   "D1",   "D2",   "D2",   "D2",   "E1",   "E1",   "E1",   "E2",   "E2","E2"),
                     VAR3 = c("alpha",  "beta", "gamma",    "alpha",    "beta", "gamma",    "alpha",    "beta", "gamma",    "alpha",
                              "beta",   "gamma",    "alpha",    "beta", "gamma",    "alpha",    "beta", "gamma",    "alpha",    "beta",
                              "gamma",  "alpha",    "beta", "gamma",    "alpha",    "beta", "gamma",    "alpha",    "beta", "gamma"),
                     VALUE1 = c(56,  38, 83, 29, 73, 75, 99, 88, 30, 58,
                               97,  42, 64, 67, 45, 99, 62, 65, 7,  40,
                               49,  62, 82, 63, 65, 40, 45, 29, 73, 51),
VALUE2 = c(24,10,24 ,19 ,18,    6,  6,  9,  20,         
13, 5,  10, 13, 26, 13, 11, 26, 14,         
13, 15, 14, 19, 25, 22, 9,  10, 16, 17, 15, 11),
VALUE3 = c(69,  90, 91, 81, 66, 69, 76, 99,
77, 66, 88, 72, 69, 74, 85, 91,
69, 68, 73, 66, 74, 68, 90, 97,
83, 78, 88, 92, 66, 68))

I want to transpose this dataset into wide format, with values of VAR3 as columns & VALUE1,VALUE2 & VALUE3 are the values. So there will be 9 extra columns.

I can do it transposing each value column & then finally merge the 3 datasets. Can anyone please guide me how to do it in one step, without merging? I don't want to merge as the dataset is huge. So merging will be inefficient.

Thanks!

Beta
  • 1,638
  • 5
  • 33
  • 67

2 Answers2

4

Something like this?

library(tidyr)
mydata %>% 
  gather(key = key, value = value, VALUE1:VALUE3) %>% 
  unite(col = key_new, VAR3, key) %>% 
  spread(key = key_new, value = value)
Tino
  • 2,091
  • 13
  • 15
  • @Tino: This is perfect!! Can you please explain the steps as well. Actually I & surely many others come across this problem regularly. But one don't find the right solution to it. So if the answer is more detailed it will help many other people as well. But I'm marking your answer as answered. Thanks! – Beta Feb 14 '18 at 05:48
  • 1
    @Beta glad it works for you. You first have to collect the 3 variables `VALUE1` to `VALUE3` in one by stacking using `gather` (making the data long). Then you `unite` the previous column names which are now in `key`-variable with the `VAR3` to get/prepare the new column names (all combination, thus 9). Then you transform back to wide, which gives you the 9 columns. You can evaluate the code step-by-step to see what's happening. – Tino Feb 14 '18 at 05:54
  • 1
    @Tino: Thanks for the explaination. It's very helpful to me & anyone else who will refer this solution. Thanks again! – Beta Feb 14 '18 at 06:34
2

With dcast from data.table we can do this without converting to 'long' format as it can take multiple value.var columns

library(data.table)
dcast(setDT(mydata), VAR1 + VAR2 ~ VAR3, value.var = c('VALUE1', 'VALUE2', 'VALUE3'))

-output

# VAR1 VAR2 VALUE1_alpha VALUE1_beta VALUE1_gamma VALUE2_alpha VALUE2_beta VALUE2_gamma VALUE3_alpha VALUE3_beta VALUE3_gamma
# 1:    A   A1           56          38           83           24          10           24           69          90           91
# 2:    A   A2           29          73           75           19          18            6           81          66           69
# 3:    B   B1           99          88           30            6           9           20           76          99           77
# 4:    B   B2           58          97           42           13           5           10           66          88           72
# 5:    C   C1           64          67           45           13          26           13           69          74           85
# 6:    C   C2           99          62           65           11          26           14           91          69           68
# 7:    D   D1            7          40           49           13          15           14           73          66           74
# 8:    D   D2           62          82           63           19          25           22           68          90           97
# 9:    E   E1           65          40           45            9          10           16           83          78           88
#10:    E   E2           29          73           51           17          15           11           92          66           68
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks a lot for your answer. This answer is slightly better than Tino's. My data size is large. So if I first convert the data into long format & then transform it I might get memory error. But since I already marked Tino's answer as correct, I don't want to change it. But thanks again! This is what I was looking for. – Beta Feb 14 '18 at 06:33
  • @Beta No problem. Thank you for the comment. The `data.table` `melt` and `dcast` have arguments to take multiple patterns or value.names respectively – akrun Feb 14 '18 at 06:34
  • One more favor from you. Will it be possible for you to refer me a good resource to learn `data.table` uses. I could not find good ones which is more detailed oriented. I want to learn `data.table` better. Thanks! – Beta Feb 14 '18 at 06:36
  • @Beta Not sure if you have read the [vignettes](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) yet. It would be a good starting point. Also, another option if you are looking for some structured learning might be to check the courses in `datacamp` (BTW, I am not affiliated with the datacamp) – akrun Feb 14 '18 at 06:38
  • 1
    Thanks for the links. I tried `datacamp`. But after one course it's paid. So I didn't proceed with it. `Vignettes` seems useful. Don't know why I didn't look into it. Thanks a lot. You had been most helpful as always!!! – Beta Feb 14 '18 at 06:41
  • 1
    @akrun looking at these kind of solutions always reminds me to eventually learn `data.table`. I guess it is also superior regarding memory useage and speed... Anyway, thanks for this! – Tino Feb 14 '18 at 06:52
  • 1
    @Tino Thanks. Though, I don't have any preference, I have used data.table in production when efficiency is very important) and `tidyverse` where the code readability is more important – akrun Feb 14 '18 at 06:55