Summarize the values of a column based on the values of more than two columns of the same dataframe

Question

I have the dataframe below.

year<-c(2016,2016,2017,2017,2016,2016,2017,2017)
city<-c("NY","NY","NY","NY","WS","WS","WS","WS")
spec<-c("df","df","df","df","vb","vb","vb","vb")
num<-c(45,67,89,90,45,67,89,90)
df<-data.frame(year,city,spec,num)

I would like to know if it is possible to sum the num based on year,city and spec columns in order to bring it from this form:

year city spec num
1 2016   NY   df  45
2 2016   NY   df  67
3 2017   NY   df  89
4 2017   NY   df  90
5 2016   WS   vb  45
6 2016   WS   vb  67
7 2017   WS   vb  89
8 2017   WS   vb  90

to this:

year city spec num
1 2016   NY   df 112
2 2017   NY   df 179
3 2016   WS   vb 112
4 2017   WS   vb 179

score 1 · Answer 1 · answered Jun 15 '19 at 01:29

1

One way is to use sqldf package:

sqldf("Select year, city, spec, sum(num) from df 
      group by year, city, spec order by city")

  year city spec sum(num)
1 2016   NY   df      112
2 2017   NY   df      179
3 2016   WS   vb      112
4 2017   WS   vb      179

Using dplyr

df %>% 
  group_by(year, city, spec) %>% 
  summarise(SumNum = sum(num)) %>% 
  arrange(city)

answered Jun 15 '19 at 01:29

Jason Mathews

765
3
13

hmm in my actual dataset the dplyr solution gives me the same result for all the rows – firmo23 Jun 15 '19 at 02:08
It sounds dumb but restart your RStudio and run it again – Jason Mathews Jun 15 '19 at 02:10

score 1 · Accepted Answer · answered Jun 15 '19 at 01:30

Possible duplicate, but here is an answer:

library(tidyverse)

df %>%
  group_by(year,city,spec) %>%
  summarise(sum = sum(num))

...results in ...

# A tibble: 4 x 4
# Groups:   year, city [4]
   year city  spec    sum
  <dbl> <fct> <fct> <dbl>
1  2016 NY    df      112
2  2016 WS    vb      112
3  2017 NY    df      179
4  2017 WS    vb      179

Summarize the values of a column based on the values of more than two columns of the same dataframe

2 Answers2