This is a follow up question to the following problem give here
I have the following data
Data:
df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L,
300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L),variable3=c(10L, 1L, 1L, 4L, 2L, 3L, 3L, 10L, 3L),variable4=c(2L, 1L, 1L, 7L, 2L, 3L, 3L, 8L, 3L)).Names = c("Org_ID", "Market_volume", "Indicator_variable","Var3","Var4"
), class = "data.frame", row.names = c(NA, -9L))
Using (dplyr), i calculated the % of NA's by market volume by Org_ID via the following function
df %>%
group_by(Org_ID) %>%
summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
tot_market_vol = sum(Market_volume)) %>%
transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)
Result:
# A tibble: 3 x 2
Org_ID Perc_Market_Vol
<int> <dbl>
1 1 83.33333
2 2 0.00000
3 3 100.00000
Question: I want to subset my original data by deleting all rows of Org_ID (say 2) # X if perc_market_vol<30. That is i do not want to delete individual rows of the same org_id, but Org_id as a whole, say all counts of Org_id =1 or org_id = 2. How can i subset it linking two tables or functions?
I want the new data look like this:
df1 = structure(list(Org_ID = c(1L, 1L, 1L, 3L, 3L, 3L, 3L),
Market_volume = c(100L, 200L, 300L, 400L, 200L,
300L, 100L), Indicator_variable = c(1L, 0L, 0L, 0L,
0L, 0L, 0L),variable3=c(10L, 1L, 1L, 3L, 3L, 10L, 3L),variable4=c(2L, 1L, 1L, 3L, 3L, 8L, 3L)).Names = c("Org_ID", "Market_volume", "Indicator_variable","Var3","Var4"
), class = "data.frame", row.names = c(NA, -7L))