Pivot dataset in R with summaries from different columns

Question

I have a dataset on electoral results that looks like this.

ward_name <- c("Abbey", "Abbey", "Abbey", "Abbey","Abbey", "Abbey", "Alibon", "Alibon", "Alibon", "Alibon")
council_name <- c("Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barnet")
cand_name <- c ("Joe", "Anthony", "Mohammed", "Vivian", "Elizabeth", "Ruth", "Brenda", "Sashi", "Aatif", "Peter")
party_abbrev <- c("Grn", "L", "C", "LD", "C", "L", "C", "Grn", "C", "Grn")
votes <- c(333, 222, 100, 98, 450, 540, 33, 450, 560, 120)
elected <- c(1, 1, 1, 0, 0, 1, 0, 0, 0, 1)
df <- data.frame(ward_name, council_name, cand_name, party_abbrev, votes, elected)

What I basically want is to have a dataframe like this one:

council name <- c("Barking and Dagenham", "Barnet")
elected_L <-  c(2, 0)
elected_C <- c(1, 0)
elected_LD <- c(0, 0)
elected_Gr <- c(1, 1)
no_candidates <- c(9, 1)
df.2 <- data.frame(council_name, elected_L, elected_C, elected_LD, elected_Gr, no_candidates)

I have tried different things in order to get it, both trying to get summaries of the elected counts with mutate and group_by and then try to pivot to longer, and otherwise pivot first and then try to manipulate the data. But I am not being able to get what I want, either because the pivot doesn't work or because the data manipulation shortens the number of counts in ways I do not want it to.

Thanks a lot!

score 2 · Answer 1 · answered Jun 08 '20 at 10:12

here is a data.table approach

library( data.table )
#make it a data.table
setDT(df)
#cast ansd join
dcast( df, 
       council_name ~ party_abbrev, 
       value.var = "elected",
       fun.aggregate = sum )[ df[, .N, by = council_name], 
                              no_candidates := i.N, 
                              on = .(council_name) ][]

output

#            council_name C Grn L LD no_candidates
# 1: Barking and Dagenham 1   1 2  0             9
# 2:               Barnet 0   1 0  0             1

score 1 · Accepted Answer · answered Jun 08 '20 at 09:57

We can first count total number of candidates for each council_name, keep only rows with elected candidates, count them for each council_name and party_abbrev and get data in wide format using pivot_wider.

library(dplyr)

df %>%
  add_count(council_name, name = 'No_candidates') %>%
  filter(elected == 1) %>%
  count(council_name, party_abbrev, No_candidates) %>%
  tidyr::pivot_wider(names_from = party_abbrev, values_from = n, 
                     values_fill = list(n = 0), names_prefix = 'elected_')

# A tibble: 2 x 5
#  council_name         No_candidates elected_C elected_Grn elected_L
#  <chr>                        <int>     <int>       <int>     <int>
#1 Barking and Dagenham             9         1           1         2
#2 Barnet                           1         0           1         0

Pivot dataset in R with summaries from different columns

2 Answers2