Convert data frame from long to wide format with lots of columns in R

Question

This may have already been answered but I could not find exactly what I wanted.

I have a data frame like:

Area <- c(1,1,1,1,2,2,2,2,3,3,3,3)
Scenario <- c(a,b,c,d,a,b,c,d,a,b,c,d)
Type <- c(EV, EV, EV, EV, EV, EV, EV, EV, EV, EV, EV, EV,)
Y2020 <- c(0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6)
Y2021 <- c(0.2,0.4,0.5,0.6,0.8,1.0,1.0,1.1,1.2,1.5,1.3,1.5)
y2022 <- c(0.3,0.6,0.2,0.7,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.6)

dt <- data.frame(Area,Scenario, Y2020, Y2021, Y2022)

So will look something like:

Area  Scenario Type Y2020  Y2021  Y2022
 1       a      EV   0.5    0.2    0.3
 1       b      EV   0.6    0.4    0.6
 1       c      EV   0.7    0.5    0.8
 1       d      EV   0.8    0.6    0.7
 2       a      EV   0.9    0.8    0.5
 2       b      EV   1.0    1.0    0.6
 2       c      EV   1.1    1.0    0.7
 2       d      EV   1.2    1.1    0.8
 3       a      EV   1.3    1.2    0.9
 3       b      EV   1.4    1.5    1.1
 3       c      EV   1.5    1.3    1.3
 3       d      EV   1.6    1.5    1.6

I would like to get it in wide format by rotating by the Scenario column like this:

Area Type Y2020_a  Y2021_a  Y2022_a  Y2020_b  Y2021_b ...
 1    EV    0.5      0.2      0.3      0.6      0.4
 2    EV    0.9      0.8      0.5      1.0      1.0
 3    EV    1.3      1.2      0.9      1.4      1.5

I tried to use dcast(dt, id ~ Scenario, value.var=names(dt)[4:6]) as suggested by @Arun from Reshape multiple values at once but it returned "Error in .subset2(x, i, exact = exact) : recursive indexing failed at level 2"

This is a condensed version of my actual data so if it could be replicated with a larger data set that would be great!

I hope someone can help! Thanks

Try looking at `tidyr::pivot_wider()`, docs here: https://tidyr.tidyverse.org/reference/pivot_wider.html. Functions from the tidyverse tend to be better maintained. — seagullnutkin, Feb 01 '21 at 15:40
@seagullnutkin Can you please provide some proofs to that claim? From my experience, tidyverse functions tend to change their whole API very frequently. — David Arenburg, Feb 01 '21 at 15:42
@DavidArenburg they have made a bunch of recent updates to functions in the tidyverse, but it's maintained by RStudio, so I think that tidyverse packages do get more attention/maintenance from a larger group of people who are invested in keeping it working. `reshape2` is also a good package, but I like using the tidyverse because of the package ecosystem and good documentation. It's a personal preference. — seagullnutkin, Feb 01 '21 at 15:46
@seagullnutkin rehsape2 is not even maintained anymore. OP is referring to the data.table package which is very well maintained. — David Arenburg, Feb 01 '21 at 15:48
Ah ok I didn't realize op was referring to the function from `data.table`. `reshape2` has a function that pivots tables that's also called `dcast`. — seagullnutkin, Feb 01 '21 at 15:50
Anyhow, one could do `library(data.table) ; dcast(melt(setDT(dt), id = c("Area", "Type", "Scenario")), Area + Type ~ variable + Scenario)` — David Arenburg, Feb 01 '21 at 15:53

barboulotte · Answer 1 · 2021-02-01T16:41:09.087

A proposition with the function reshape():

dt <- read.table(header = TRUE, text = "
Area  Scenario Type Y2020  Y2021  Y2022
 1       a      EV   0.5    0.2    0.3
 1       b      EV   0.6    0.4    0.6
 1       c      EV   0.7    0.5    0.8
 1       d      EV   0.8    0.6    0.7
 2       a      EV   0.9    0.8    0.5
 2       b      EV   1.0    1.0    0.6
 2       c      EV   1.1    1.0    0.7
 2       d      EV   1.2    1.1    0.8
 3       a      EV   1.3    1.2    0.9
 3       b      EV   1.4    1.5    1.1
 3       c      EV   1.5    1.3    1.3
 3       d      EV   1.6    1.5    1.6
")

reshape(data = dt,
        idvar = c("Area", "Type"),
        v.names = c("Y2020", "Y2021", "Y2022"),
        timevar = "Scenario",
        direction = "wide")
#>   Area Type Y2020.a Y2021.a Y2022.a Y2020.b Y2021.b Y2022.b Y2020.c Y2021.c
#> 1    1   EV     0.5     0.2     0.3     0.6     0.4     0.6     0.7     0.5
#> 5    2   EV     0.9     0.8     0.5     1.0     1.0     0.6     1.1     1.0
#> 9    3   EV     1.3     1.2     0.9     1.4     1.5     1.1     1.5     1.3
#>   Y2022.c Y2020.d Y2021.d Y2022.d
#> 1     0.8     0.8     0.6     0.7
#> 5     0.7     1.2     1.1     0.8
#> 9     1.3     1.6     1.5     1.6

# Created on 2021-02-01 by the reprex package (v0.3.0.9001)

Regards,

Mohan Govindasamy · Answer 2 · 2021-02-01T16:00:56.783

You need to convert the data to long format first and then to wide format

library(tidyverse)


Area <- c(1,1,1,1,2,2,2,2,3,3,3,3)
Scenario <- c("a", "b", "c", "d","a", "b", "c", "d","a", "b", "c", "d")
Type <- c("EV", "EV", "EV", "EV", "EV", "EV", "EV", "EV", "EV", "EV", "EV", "EV")
Y2020 <- c(0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6)
Y2021 <- c(0.2,0.4,0.5,0.6,0.8,1.0,1.0,1.1,1.2,1.5,1.3,1.5)
  Y2022 <- c(0.3,0.6,0.2,0.7,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.6)

dt <- data.frame(Area,Type,Scenario, Y2020, Y2021, Y2022)

dt %>% 
  as_tibble() %>% 
  pivot_longer(-(1:3)) %>% 
  mutate(name = paste0(name, "_", Scenario)) %>% 
  select(-3) %>% 
  pivot_wider(names_from = name, values_from = value)
#> # A tibble: 3 x 14
#>    Area Type  Y2020_a Y2021_a Y2022_a Y2020_b Y2021_b Y2022_b Y2020_c Y2021_c
#>   <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1     1 EV        0.5     0.2     0.3     0.6     0.4     0.6     0.7     0.5
#> 2     2 EV        0.9     0.8     0.5     1       1       0.6     1.1     1  
#> 3     3 EV        1.3     1.2     0.9     1.4     1.5     1.1     1.5     1.3
#> # … with 4 more variables: Y2022_c <dbl>, Y2020_d <dbl>, Y2021_d <dbl>,
#> #   Y2022_d <dbl>

^{Created on 2021-02-01 by the reprex package (v0.3.0)}

That's great! How do I keep the Type column? I only put it in there so it could be kept! — EllisR8, Feb 01 '21 at 15:55
It was not included in the data frame created in your original code so It was not there, now I have edited the code — Mohan Govindasamy, Feb 01 '21 at 16:01

Convert data frame from long to wide format with lots of columns in R

2 Answers2