R Sum every n rows across n columns

Question

I have a data.frame that looks like this:

Geotype <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)
Strategy <- c("Demand", "Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5", "Strategy 6")
Year.1  <- c(1:21)
Year.2  <- c(1:21)
Year.3  <- c(1:21)
Year.4  <- c(1:21)
mydata <- data.frame(Geotype,Strategy,Year.1, Year.2, Year.3, Year.4)

I want to sum each Strategy for each Year.

This means I need to sum 6 rows down each column in the data frame and then skip the Demand row. I then want to repeat this for all columns (40 years).

I want the output data frame to look like this:

Geotype.output <- c(1, 2, 3)
Year.1.output  <- c(27, 69, 111)
Year.2.output  <- c(27, 69, 111)
Year.3.output  <- c(27, 69, 111)
Year.4.output  <- c(27, 69, 111)
output <- data.frame(Geotype.output,Year.1.output, Year.2.output, Year.3.output, Year.4.output)

Any suggestions on how to do this elegantly? I tried to hack a solution together using this, this and this, but I wasn't successful because I need to skip a row.

Cath · Answer 1 · 2016-10-10T11:45:12.497

You can try with base R aggregate function (to aggregate data by Geotype, using function sum as "unique value") but using a reduced data.frame (without the "Demand" rows and the Strategy column):

aggregate(.~Geotype, data=mydata[mydata$Strategy !="Demand", -2], FUN=sum)
#  Geotype Year.1 Year.2 Year.3 Year.4
#1       1     27     27     27     27
#2       2     69     69     69     69
#3       3    111    111    111    111

dww · Accepted Answer · 2016-10-10T11:44:20.367

5

Using data.table:

library(data.table)
setDT(mydata)
output = mydata[Strategy != "Demand", 
             .(Year.1.output = sum (Year.1), 
               Year.2.output = sum (Year.2), 
               Year.3.output = sum (Year.3), 
               Year.4.output = sum (Year.4)),
             by = Geotype]

#    Geotype Year.1.output Year.2.output Year.3.output Year.4.output
# 1:       1            27            27            27            27
# 2:       2            69            69            69            69
# 3:       3           111           111           111           111

We can simplify this to deal more easily with many year columns by

setDT(mydata)[Strategy != "Demand", 
             lapply(.SD, sum), 
             by=Geotype, 
             .SDcols=grep("Year", names(mydata))]

edited Oct 10 '16 at 11:44

answered Oct 10 '16 at 11:28

dww

30,425
5
68
111

8

you can simplify by `setDT(ydata)[Strategy != "Demand", lapply(.SD, sum), by=Geotype, .SDcols=grep("Year", names(mydata))]` (which should be more convenient as OP has 40 "Year" columns...) – Cath Oct 10 '16 at 11:31
1

This worked great. I actually used Cath's more elegant suggestion. Please can you provide an edit in your answer with the short data.table version. – Thirst for Knowledge Oct 10 '16 at 11:42
Thanks @Cath I added your suggestion into the answer – dww Oct 10 '16 at 11:44

Wietze314 · Answer 3 · 2016-10-10T11:42:25.583

I would prefer getting my data in a long format like so:

library(dplyr)
library(tidyr)
library(reshape2)

mydata %>% gather(key, value, - Geotype, - Strategy) %>%
  filter(Strategy!="Demand") %>% group_by(Geotype, key) %>%
  summarize(sum = sum(value))

resultin in:

Geotype    key   sum
<dbl>  <chr> <int>
1        1 Year.1    27
2        1 Year.2    27
3        1 Year.3    27
4        1 Year.4    27
5        2 Year.1    69
6        2 Year.2    69
7        2 Year.3    69
8        2 Year.4    69
9        3 Year.1   111
10       3 Year.2   111
11       3 Year.3   111
12       3 Year.4   111

Using spread:

mydata %>% gather(key, value, - Geotype, - Strategy) %>%
  filter(Strategy!="Demand") %>% group_by(Geotype, key) %>%
  summarize(sum = sum(value)) %>% spread(key, sum)

yields

Geotype Year.1 Year.2 Year.3 Year.4
*   <dbl>  <int>  <int>  <int>  <int>
1       1     27     27     27     27
2       2     69     69     69     69
3       3    111    111    111    111

Thanx @zx8754, I used to use `dcast` but I like `spread` more! — Wietze314, Oct 10 '16 at 11:42

score 0 · Answer 4 · answered Oct 10 '16 at 14:10

0

My reputation is too low to comment but you can use dplyr and summarize_each.

mydata %>% dplyr::filter(Strategy!="Demand") %>% group_by(Geotype) %>% summarize_each(funs(sum), contains("Year"))

answered Oct 10 '16 at 14:10

Raphael Lee

111
1
10

R Sum every n rows across n columns

4 Answers4