0

I have this data frame:

YEAR   NATION    VOTE
2015     NOR        1
2015     USA        0
2015     CAN        1
2015     RUS        1
2014     USA        1
2014     USA        1
2014     USA        0
2014     NOR        1
2014     NOR        0
2014     CAN        1

...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):

YEAR   NATION    VOTE-1   sumVOTES    %-1s
2015     USA          8         17    47.1
2015     NOR          7         13    53.8
2015     CAN          3         11    27.2
2014     etc.
etc.
Dag
  • 569
  • 2
  • 5
  • 20

1 Answers1

2

You are not providing your data.frame in a reproducible manner. But this should work...

library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1), 
              'sumVOTES' = .N, 
              '%-1s' = 1e2*sum(VOTE==1)/.N), 
 by = .(YEAR, NATION)]

setDT converts data.frame to data.table by reference.

Community
  • 1
  • 1
statquant
  • 13,672
  • 21
  • 91
  • 162
  • 1
    With `dplyr` you can do that in 3 simple steps: 1. Sum up per year and nation `library(dplyr)` `d1 <- data %>% group_by(YEAR, NATION) %>% summarise(sum_of_year = sum(VOTE))` 2. Sum up votes per nation `d2 <- data %>% group_by(NATION) %>% summarise(sum_of_1s = sum(VOTE))` 3. merge the two created dataframes `d3 <- merge(d1, d2, by = "NATION")` – kabr Apr 20 '16 at 20:02