1

I recently asked a question regarding database managemen in R packages tidyr, dplyr or similar in this link (Reorganizing columns by two column combination).

It was helpful, I managed to do the proposed code, but I was wring in the final format I was requiring.

I cannot manage to reach this format and was wondering how could be easily done.

Given this as an example database:

Factor 1    Factor 2        Year    value1   value2
A            green          2016     1.9      20
A            green          2015     1.9      20
A            green          2015      4       30
B            yellow         2015      3       10
B            yellow         2016      8       11

And trying to obtain:

Factor 1    Factor 2   value1.2015   value1.2016 value2.2015 value2.2016   
A            green          5.9          1.9            50           20
B            yellow         3            8              10           11

So, it would be setting common identifiers for Factor, 1 and Factor 2, to spread the dataset by years and summing up common years for values 1 and 2

I am a begginer with tidyr, dplyr and cannot easily manage to do this.

I have been able to spread the dataset by years doing :

df.spread<-df %>%
  gather(value1,value2,-factor1,-factor2,-Year) %>%
  unite(Year,Year, value1, sep = "") %>%
  spread(Year,value2)

but this does not sum the value1 for the common years as I want it.

gfmg1992
  • 99
  • 2
  • 8

1 Answers1

0

We group by 'Factor1', 'Factor2', 'Year', get the sum of all columns (summarise_all), then gather into 'long' format, unite the 'Year', 'key' columns together to create a single column and spread from 'long' to 'wide' format

library(tidyverse)
df %>% 
    group_by(Factor1, Factor2, Year) %>%
    summarise_all(sum) %>% 
    gather(key, value, value1:value2) %>% 
    unite(Year, key, Year, sep=".") %>% 
    spread(Year, value)
# Groups:   Factor1, Factor2 [2]
#  Factor1 Factor2 value1.2015 value1.2016 value2.2015 value2.2016
#  <chr>   <chr>         <dbl>       <dbl>       <dbl>       <dbl>
#1 A       green           5.9         1.9          50          20
#2 B       yellow          3           8            10          11

This can also be done with dcast from data.table, where we can pass multiple value.var columns and a fun.aggregate argument

library(data.table)
dcast(setDT(df), Factor1 + Factor2 ~ Year, value.var = c('value1', 'value2'), sum)
#    Factor1 Factor2 value1_2015 value1_2016 value2_2015 value2_2016
#1:       A   green         5.9         1.9          50          20
#2:       B  yellow         3.0         8.0          10          11

data

df <- structure(list(Factor1 = c("A", "A", "A", "B", "B"), Factor2 = c("green", 
"green", "green", "yellow", "yellow"), Year = c(2016L, 2015L, 
2015L, 2015L, 2016L), value1 = c(1.9, 1.9, 4, 3, 8), value2 = c(20L, 
20L, 30L, 10L, 11L)), class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    That is exactly what I was looking for. Thanks for providing also an alternative package. I need to get my head around the tidyverse functions! – gfmg1992 Jun 24 '19 at 11:45