2

I have grouped and summarized a data frame in R so that I now have a table like:

Group | Value | Count
==========================
   A  |   1   |   4
   A  |   2   |   2
   A  |   10  |   4
   B  |   3   |   2
   B  |   4   |   4
   B  |   2   |   3
   C  |   5   |   3
   C  |   2   |   6

I am interested in finding out the relative frequency of the value 2 within each group:

Group | Relative freq of 2
==========================
   A  |  2/(4+2+4) = 0.2
   B  |  3/(2+4+3) = 0.33
   C  |  6/(3+6) = 0.67

Is there a simple, elegant way of calculating this in R, other than writing a bunch of code with loops and conditionals? Possibly using dplyr.

AbhinavChoudhury
  • 1,167
  • 1
  • 18
  • 38

4 Answers4

3

Using dplyr, after grouping by 'Group', we subset the 'Count' where 'Value' is 2 (assuming there is only a single 'Value' of 2 per each 'Group') and divide by the sum of 'Count'

library(dplyr)
df1 %>%
   group_by(Group) %>% 
   summarise(RelFreq = round(Count[Value==2]/sum(Count), 2))
#  Group RelFreq
#  <chr>   <dbl>
#1     A    0.20
#2     B    0.33
#3     C    0.67

The corresponding data.table option is

library(data.table)
setDT(df1)[, .(RelFreq = round(Count[Value == 2]/sum(Count),2)), by = Group]
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Here is a base R solution:

sapply(split(df1, df1$Group), 
   function(x) round(sum(x$Count[x$Value == 2]) / sum(x$Count), 2))

##  A    B    C 
## 0.20 0.33 0.67 
johannes
  • 14,043
  • 5
  • 40
  • 51
1

You can use the same logic using for loops

for(i in unique(df$Group)){
  df$Relative_freq_of_2[df$Group==i] <- round(df$Count[df$Value==2 & df$Group==i]/sum(df$Count[df$Group==i]),2)
}

df <- unique(df[,c("Group","Relative_freq_of_2")])

Group Relative_freq_of_2
    A               0.20
    B               0.33
    C               0.67
Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22
1

This one with sqldf:

library(sqldf)
df1 <- sqldf('select `Group`,`Count` from df where Value=2')
df2 <- sqldf('select `Group`, sum(`Count`) as `Count` from df group by `Group`')
df1$Count <- df1$Count / df2$Count
df1
Group     Count
1     A 0.2000000
2     B 0.3333333
3     C 0.6666667
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63