1

I'm trying to calculate the ratio between n and sum(n) within each group. I know I'm not far from the solution.

Data :

df_rld %>% 
   select(type, run_length) %>% 
   mutate(run_length = as.numeric(run_length)) %>% 
   group_by(type, run_length) %>% 
   count(run_length)
type  | run_length | n 
---------------------------
A     |      15    | 1
B     |      24    | 3
B     |      26    | 7
C     |      27    | 10
C     |      28    | 2

What I want :

type  | run_length | n     | ratio
-----------------------------------------
A     |      15    | 1     | 1 / 1 = 1
B     |      24    | 3     | 3 / (3+7) = 0.3
B     |      26    | 7     | 7 / (3+7) = 0.7
C     |      27    | 10    | 10 / (10+2) = 0.83
C     |      28    | 2     | 2 / (10+2) = 0.17

The ratio denominator is the aggregate of n by group, but I don't know how to calculate it. Using group_by I can olny manage to get a sum of all n, so the ration is equal to 1 for some reason. I would like to do so without joining tables for simplicity's stake.

1 Answers1

2

You should only group by type, that way n/sum(n) will give you the correct calculation. If you group by both type and run_length, then you will always get a ratio of 1 (unless there are two entries with the same type and same value for run_length, in which case you will get 0.5 in those rows).

df_rld %>% 
   select(type, run_length) %>% 
   mutate(run_length = as.numeric(run_length)) %>% 
   group_by(type, run_length) %>% 
   count(run_length) %>%
   group_by(type) %>% 
   mutate(ratio = n/sum(n))

#> # A tibble: 5 x 4
#> # Groups:   type [3]
#>   type  run_length     n ratio
#>   <fct>      <int> <int> <dbl>
#> 1 A             15     1 1    
#> 2 B             24     3 0.3  
#> 3 B             26     7 0.7  
#> 4 C             27    10 0.833
#> 5 C             28     2 0.167
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87