Summarizing data by subgroups

Question

My dataset looks like this

Org_ID      Market volume  Indicator variable
1                100              1

1                200              0

1                300              0

2                 50              1

2                500              1

3                400              0

3                200              0

3                300              0

3                100              0

And i want to summarize it by market TRx and org_id by calculating the % of 0 indicator variables in terms of market volume, as follows:

Org_ID   % of 0's by market volume
1   83.3%

2   0%

3   100%

I tried subgroups but can't seem to be able to do this. Can anyone suggest what are some of the ways i can do?

Follow this approach: https://stackoverflow.com/questions/29549731/finding-percentage-in-a-sub-group-using-group-by-and-summarise . — AntoniosK, Nov 03 '17 at 14:35
Also, you're more likely to get a quick answer by providing the data in a quickly accessible form: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — ssp3nc3r, Nov 03 '17 at 14:39
Possible duplicate of [Finding percentage in a sub-group using group\_by and summarise](https://stackoverflow.com/questions/29549731/finding-percentage-in-a-sub-group-using-group-by-and-summarise) — Rui Barradas, Nov 03 '17 at 18:09

acylam · Accepted Answer · 2017-11-03T15:28:57.627

with dplyr:

library(dplyr)

df %>%
  group_by(Org_ID) %>%
  summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
            tot_market_vol = sum(Market_volume)) %>%
  transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)

Result:

# A tibble: 3 x 2
  Org_ID Perc_Market_Vol
   <int>           <dbl>
1      1        83.33333
2      2         0.00000
3      3       100.00000

Data:

df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L)), .Names = c("Org_ID", "Market_volume", "Indicator_variable"
), class = "data.frame", row.names = c(NA, -9L))

Thank you so much! This worked perfectly! – T.Z Nov 07 '17 at 22:25 — T.Z, Nov 07 '17 at 22:25

Summarizing data by subgroups

1 Answers1

Linked