Is it possible to use spread on multiple columns in tidyr similar to dcast?

Question

I have the following dummy data:

library(dplyr)
library(tidyr)
library(reshape2)
dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, "I")) %>%   select(Product, Country, Year)
dt$value <- rnorm(nrow(dt))

I pick two product-country combinations

sdt <- dt %>% filter((Product == "A" & Country == "AI") | (Product == "B" & Country =="EI"))

and I want to see the values side by side for each combination. I can do this with dcast:

sdt %>% dcast(Year ~ Product + Country)

Is it possible to do this with spread from the package tidyr?

@jaap this question is not duplicate of the older one. The older one is about reshaping data in general, this one is about how to achieve certain pivoting within the specific package. — mpiktas, Dec 17 '19 at 07:06
This _**is**_ a dupe. It is true that the other question doens't focus on a specific question and therefore has attracted solutions with several packages, including the one asked for by you. This makes it a perfect target to close specific questions. — Jaap, Dec 17 '19 at 08:10
@Jaap I'd argue this is not a duplicate. This question is about creating the column names from multiple source columns, with a single column for values. Whereas the linked question is about picking values from multiple source columns, with one column used for names. Some answers to the linked question reshape the problem so that it can be solved with solutions to this question, but that doesn't make them duplicates. — Mikko Marttila, Feb 25 '22 at 12:12

akrun · Accepted Answer · 2015-08-01T14:32:51.640

61

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)
 library(tidyr)
 sdt %>%
 mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
 select(-Product, -Country)%>% 
 spread(Prod_Count, value)%>%
 head(2)
 #  Year      A_AI       B_EI
 #1 1990 0.7878674  0.2486044
 #2 1991 0.2343285 -1.1694878

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>% 
 unite(Prod_Count, Product,Country) %>%
 spread(Prod_Count, value)%>% 
 head(2)
 #   Year      A_AI       B_EI
 # 1 1990 0.7878674  0.2486044
 # 2 1991 0.2343285 -1.1694878

edited Aug 01 '15 at 14:32

answered Jul 24 '14 at 09:38

akrun

874,273
37
540
662

1

Yeah, this was the first thing that came to my mind. But it is not pretty :) – mpiktas Jul 24 '14 at 09:40
@mpiktas. I can only think of this method. May be somebody else will come up with compact code :)- – akrun Jul 24 '14 at 09:41
9

well there is `unite()` but it appears to only work with numeric data (on purpose though?). – erc Jul 24 '14 at 10:05
4

@beetroot, thanks, Yes, It seems to work `sdt%>% unite(Prod_Count, Product,Country) %>% spread(Prod_Count, value)%>% head()` – akrun Jul 24 '14 at 10:10
@akrun Ah yes great, I thought it's `c("Product", "Country")` and got an error message, my mistake then. – erc Jul 24 '14 at 10:16
27

This is the Hadley approved way of solving this problem ;) – hadley Jul 25 '14 at 21:55
5

After having consulted this thread multiple times the last months, I find the reshape2/dcast-based solution the most elegant. See also http://stackoverflow.com/questions/27418919/dplyr-with-subgroup-join, where the spread-based solution cannot be generalized to multiple grouping columns, but the reshape-based can. – Dieter Menne Jan 07 '15 at 11:02
1

@DieterMenne Thanks for the input. Yes, there are limitations in `spread`. I would also say that the `dcast` method has also limitations with multiple `value` columns. Heard that in the next release of `data.table`, it would be fixed. – akrun Jan 07 '15 at 11:18
7

@hadley this is an unusually ugly solution for the tidyverse. All the columns have to be listed out multiple times, and what's worse they lose types, so everything has to be cast back to numeric. – dfrankow May 24 '17 at 19:46

hplieninger · Answer 2 · 2019-09-19T09:37:23.237

With the new function pivot_wider() introduced in tidyr version 1.0.0, this can be accomplished with one function call.

pivot_wider() (counterpart: pivot_longer()) works similar to spread(). However, it offers additional functionality such as using multiple key/name columns (and/or multiple value columns). To this end, the argument names_from—that indicates from which column(s) the names of the new variables are taken—may take more than one column name (here Product and Country).

library("tidyr")

sdt %>% 
    pivot_wider(id_cols = Year,
                names_from = c(Product, Country)) %>% 
    head(2)
#> # A tibble: 2 x 3
#>     Year   A_AI    B_EI
#>    <int>  <dbl>   <dbl>
#>  1  1990 -2.08  -0.113 
#>  2  1991 -1.02  -0.0546

See also: https://tidyr.tidyverse.org/articles/pivot.html

score 0 · Answer 3 · answered Oct 26 '19 at 09:15

Base R solution:

 # Concatenate grouping vector: 

dt$PC <- paste0(dt$Product, "_", dt$Country)

# Spread the vectors by year: 

dt2 <- reshape(dt[,c(!(names(dt) %in% c("Product", "Country")))],

               idvar = "Year",

               ids = unique(dt$Year),

               direction = "wide",

               timevar = "PC")

# Remove "value.", from the vector names:

names(dt2) <- gsub("value[.]", "", names(dt2))

Data:

dt <- expand.grid(Year = 1990:2014,

                  Product = LETTERS[1:8],

                  Country = paste0(LETTERS, "I"))

dt$value <- rnorm(nrow(dt))

Is it possible to use spread on multiple columns in tidyr similar to dcast?

3 Answers3

Linked

Related