Adding variable counts via multiple grouping

Question

Disclaimer - title might be misleading - I think part of the reason why I haven't found a solution is that I don't exactly know what to google.

I have a group-level data set in an expanded format; years and country code repeated for every group (id) such as below (manual input)

year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0

I want to add country-year counts as columns at the end so it would look like the following

year   country  id  v1  v2  v3   v1.count  v2.count  v3.count
1991   20       1    1   0   0       3         2         1
1991   20       2    0   1   0       3         2         1
1991   20       3    0   0   1       3         2         1
1991   20       4    1   0   0       3         2         1
1991   20       5    1   0   0       3         2         1
1991   20       6    0   1   0       3         2         1

I have tried aggregate, count, and dplyr with no success. I thought Group by and conditionally count or Frequency count for a specific category might do the trick but I could not get it to work. How can I accomplish this?

That would sum up `v1` for all years and countries in the `df`, no? — , Jul 11 '16 at 00:01

score 1 · Accepted Answer · answered Jul 10 '16 at 22:31

We can use mutate_each from dplyr after grouping by 'year' and 'country'

df1 %>%
   group_by(year, country) %>%
   mutate_each(funs(count = sum), v1:v3)
 #  year country    id    v1    v2    v3 v1_count v2_count v3_count
 #  <int>   <int> <int> <int> <int> <int>    <int>    <int>    <int>
 #1  1991      20     1     1     0     0        3        2        1
 #2  1991      20     2     0     1     0        3        2        1
 #3  1991      20     3     0     0     1        3        2        1
 #4  1991      20     4     1     0     0        3        2        1
 #5  1991      20     5     1     0     0        3        2        1
 #6  1991      20     6     0     1     0        3        2        1

Thanks, I tried it before with just `mutate`, this one does what I need. — , Jul 11 '16 at 00:03

score 0 · Answer 2 · answered Jul 11 '16 at 01:12

I guess you can also use just mutate.

df1 <- read.table(text="year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0", head=T, as.is=T)

df1

library(dplyr)

df1 %>% group_by(year, country) %>% 
  mutate(v1.count=sum(v1), v2.count=sum(v2), v3.count=sum(v3))
# Source: local data frame [6 x 9]
# Groups: year, country [1]

#    year country    id    v1    v2    v3 v1.count v2.count v3.count
#   (int)   (int) (int) (int) (int) (int)    (int)    (int)    (int)
# 1  1991      20     1     1     0     0        3        2        1
# 2  1991      20     2     0     1     0        3        2        1
# 3  1991      20     3     0     0     1        3        2        1
# 4  1991      20     4     1     0     0        3        2        1
# 5  1991      20     5     1     0     0        3        2        1
# 6  1991      20     6     0     1     0        3        2        1

Hi, I think it works with `sum` as the other commenter has suggested, but I also want to use this for continuous variables (with `ineq` for example) and not just for binary variables. — , Jul 11 '16 at 10:36

Adding variable counts via multiple grouping

2 Answers2