1

Disclaimer - title might be misleading - I think part of the reason why I haven't found a solution is that I don't exactly know what to google.

I have a group-level data set in an expanded format; years and country code repeated for every group (id) such as below (manual input)

year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0

I want to add country-year counts as columns at the end so it would look like the following

year   country  id  v1  v2  v3   v1.count  v2.count  v3.count
1991   20       1    1   0   0       3         2         1
1991   20       2    0   1   0       3         2         1
1991   20       3    0   0   1       3         2         1
1991   20       4    1   0   0       3         2         1
1991   20       5    1   0   0       3         2         1
1991   20       6    0   1   0       3         2         1

I have tried aggregate, count, and dplyr with no success. I thought Group by and conditionally count or Frequency count for a specific category might do the trick but I could not get it to work. How can I accomplish this?

Community
  • 1
  • 1

2 Answers2

1

We can use mutate_each from dplyr after grouping by 'year' and 'country'

df1 %>%
   group_by(year, country) %>%
   mutate_each(funs(count = sum), v1:v3)
 #  year country    id    v1    v2    v3 v1_count v2_count v3_count
 #  <int>   <int> <int> <int> <int> <int>    <int>    <int>    <int>
 #1  1991      20     1     1     0     0        3        2        1
 #2  1991      20     2     0     1     0        3        2        1
 #3  1991      20     3     0     0     1        3        2        1
 #4  1991      20     4     1     0     0        3        2        1
 #5  1991      20     5     1     0     0        3        2        1
 #6  1991      20     6     0     1     0        3        2        1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks, I tried it before with just `mutate`, this one does what I need. –  Jul 11 '16 at 00:03
0

I guess you can also use just mutate.

df1 <- read.table(text="year   country  id  v1  v2  v3
1991   20       1    1   0   0
1991   20       2    0   1   0
1991   20       3    0   0   1
1991   20       4    1   0   0
1991   20       5    1   0   0
1991   20       6    0   1   0", head=T, as.is=T)

df1

library(dplyr)

df1 %>% group_by(year, country) %>% 
  mutate(v1.count=sum(v1), v2.count=sum(v2), v3.count=sum(v3))
# Source: local data frame [6 x 9]
# Groups: year, country [1]

#    year country    id    v1    v2    v3 v1.count v2.count v3.count
#   (int)   (int) (int) (int) (int) (int)    (int)    (int)    (int)
# 1  1991      20     1     1     0     0        3        2        1
# 2  1991      20     2     0     1     0        3        2        1
# 3  1991      20     3     0     0     1        3        2        1
# 4  1991      20     4     1     0     0        3        2        1
# 5  1991      20     5     1     0     0        3        2        1
# 6  1991      20     6     0     1     0        3        2        1
Ven Yao
  • 3,680
  • 2
  • 27
  • 42
  • Hi, I think it works with `sum` as the other commenter has suggested, but I also want to use this for continuous variables (with `ineq` for example) and not just for binary variables. –  Jul 11 '16 at 10:36