3

How to cast aggregated values over a range in R and fill missing range values with zero.

df <- data.frame (year = sample(c(2014:2016), 100, replace=T),
                  month = sample(c(1:5,8:12), 100, replace=T),
                  int = 1)

# install.packages("reshape")
library(reshape)
month <- cast(df, year ~ month, sum, value = 'int')
month

Output:

# output
  year 1 2 3 4 5 8 9 10 11 12
1 2014 6 5 4 3 4 4 3  3  9  2
2 2015 4 9 1 3 1 4 3  3  2  3
3 2016 0 3 3 4 4 1 4  1  3  1

How do I set the missing months to zero? The result should be like that:

# output
  year 1 2 3 4 5 >6< >7< 8 9 10 11 12
1 2014 6 5 4 3 4  0   0  4 3  3  9  2
2 2015 4 9 1 3 1  0   0  4 3  3  2  3
3 2016 0 3 3 4 4  0   0  1 4  1  3  1

Is there a way to do it over the cast function?

wittich
  • 2,079
  • 2
  • 27
  • 50

1 Answers1

4

We can use tidyverse to convert the 'month' to factor with levels specified as 1:12, get the sum of 'int' grouped by 'year', 'month' and spread to 'wide' format with drop=FALSE

library(tidyverse)
df %>%
   group_by(year, month = factor(month, levels = 1:12)) %>% 
   summarise(int = sum(int)) %>% 
   spread(month, int, drop = FALSE, fill = 0) 
#     year   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`
#* <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  2014     3     2     2     1     2     0     0     4     1     5     5     6
#2  2015     2     7     5     2     4     0     0     5     3     3     4     5
#3  2016     0     4     5     5     2     0     0     3     2     1     5     2

Or using dcast in a single line

library(data.table)
dcast(setDT(df), year ~ factor(month, levels = 1:12), sum, drop = FALSE)
#   year 1 2 3 4 5 6 7 8 9 10 11 12
#1: 2014 3 2 2 1 2 0 0 4 1  5  5  6
#2: 2015 2 7 5 2 4 0 0 5 3  3  4  5
#3: 2016 0 4 5 5 2 0 0 3 2  1  5  2

Or with xtabs from base R

xtabs(int~year+factor(month, levels = 1:12), df)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Nice answer! I prefer **`xtabs`** as it doesn't require any addition package! – wittich Feb 20 '17 at 13:10
  • 1
    For **`xtabs`** its helpful to use **`as.data.frame.matrix()`** on the output. – wittich Feb 20 '17 at 13:29
  • +1. For the `dcast` solution, you don't need `setDT` and `data.table` at all, just `dcast(df, ...)`. You need to be careful though with `drop = FALSE`, it will expand all unused levels of all factors in the formula, not just the month, so if you don't want that, drop those unwanted levels or don't let the respective factor be a factor (e.g. by using `stringsAsFactors = FALSE` when you load the data). – Tomas Jul 22 '18 at 06:16
  • @TMS Yes, it will be using `dcast` from `reshape2` as I already loaded both packages. – akrun Jul 22 '18 at 16:14