How to sum categorical variable across variables

Question

My data set looks like this

df<-read.table(text="group, race
               1, 1
               1, 1
               1, 4
               2, 1
               2, 3
               2, 4
               3, 1
               2, 1
               1, 1", header=T, sep=",")

I would like to get the sum of the count of each race BY each group. I have tried using the aggregate function but to no avail

I hope to have output that looks something like this:

group   race   total
1       1      2
1       4      1
2       1      2
.
.
.
3       1      1

What do you mean by sum of count? This looks like data where you would count values of each group and race, but I'm not sure what you would then sum. Your output doesn't quite match your data: group 1, race 1 occurs 3 times, not 2 — camille, May 20 '19 at 21:43
On earlier version of @akrun comment: ```df %>% add_count(group, race)``` does the same thing that I have with `ave()`. — M--, May 20 '19 at 22:08
@Cath Maybe ```as.data.frame(table(group=df$group, race=df$race))``` or ```as.data.frame(table(df))``` to preserve the names. — M--, May 22 '19 at 18:28

M-- · Answer 1 · 2019-05-20T22:03:58.510

It can be done in base using aggregate():

with(df, aggregate(list(count=race), by=list(group=group,race=race), FUN=length))

#>   group race count
#> 1     1    1     3
#> 2     2    1     2
#> 3     3    1     1
#> 4     2    3     1
#> 5     1    4     1
#> 6     2    4     1

or using ave() (if you want to preserve your data structure and just add column total):

df$total <- ave(df$race, df[,c("group","race")], FUN=length)

#>   group race total
#> 1     1    1     3
#> 2     1    1     3
#> 3     1    4     1
#> 4     2    1     2
#> 5     2    3     1
#> 6     2    4     1
#> 7     3    1     1
#> 8     2    1     2
#> 9     1    1     3

or `aggregate(list(count = df$race), df, length)` – G. Grothendieck May 20 '19 at 22:00 — G. Grothendieck, May 20 '19 at 22:00

score 0 · Answer 2 · answered May 20 '19 at 21:39

library(tidyverse)

df<-tribble(~"group", ~"race",
               1, 1,
               1, 1,
               1, 4,
               2, 1,
               2, 3,
               2, 4,
               3, 1,
               2, 1,
               1, 1)
df %>% 
    count(group, race)
#> # A tibble: 6 x 3
#>   group  race     n
#>   <dbl> <dbl> <int>
#> 1     1     1     3
#> 2     1     4     1
#> 3     2     1     2
#> 4     2     3     1
#> 5     2     4     1
#> 6     3     1     1

^{Created on 2019-05-20 by the reprex package (v0.2.1)}

hank_044 · Answer 3 · 2019-05-20T21:50:20.857

0

Similar to dylanjm but slightly different logic:

df %>%
group_by(group, race) %>%
summarise(Total = length(race))

Yields the same result:

# A tibble: 6 x 3
# Groups:   group [?]
  group  race Total
  <int> <int> <int>
1     1     1     3
2     1     4     1
3     2     1     2
4     2     3     1
5     2     4     1
6     3     1     1

edited May 20 '19 at 21:50

answered May 20 '19 at 21:43

hank_044

176
6

use `length` instead of `sum`! – M-- May 20 '19 at 21:45
Thanks - that's the dangers of multi-tasking. – hank_044 May 20 '19 at 21:51

How to sum categorical variable across variables

3 Answers3