I want to calculate the proportion of plan
with respect to school type
in the dataset below. Thing is, I have to first subset the dataset a couple of times. In this case, for example, I need to subset the dataset so that I only get schools which offer from level 0
up to level 04
. How can I do that?
- edit:
The way below is working, but I'm not filtering all schools that have all levels from level 0
up to level 04
, I'm getting the schools that have either of them. Ideas to that would be much appreciated.
- data description:
# SCHOOL = name of the school
# Q9 = type of school
# Q11 = levels that the schools offer
# Q40 = types of planning that the school offers related to each level.
Note: All plannings are related to a school level. Hence, If I don't filter by the levels that each school offers and just calculate this columns' prop, my results will be misleading because each school can offer different level options (all of them, 1, 2 of them, etc)
my attempt:
### D) filter schools which only offer UP TO Level 04 ###
##################################### Quest40_2 ##########################
### check vector's names:
unique(quest40_2$Q11)
unwanted_4_2 <- quest40_2 %>%
filter(Q11 %in% c('level05','level06')) ### FILTER UNWANTED SCHOOL LEVELS
### create a vector with UNWANTED SCHOOL NAMES:
filter_vec_4_2 <- unique(unwanted_4_2$SCHOOL) ### get unique names (only school has 1 name)
### assign the original dataframe to a dummy data frame:
filtered_df_4_2 <- quest40_2
### loop over unwanted schools' names to remove them:
for (i in 1 :length(filter_vec_4_2)) {
filtered_df_4_2 <- filtered_df_4_2[!filtered_df_4_2$SCHOOL == filter_vec_4_2[i],]
}
- I need to count how many times EACH 'plan' occurs by each type of 'school' (and get its proportion). Problem = each school can have more than one 'plan' type. Since each school can have more than one
plan type
, if we want to count the proportion ofplan
types according to eachschool type
, we cannot divide by n
b <- filtered_df_4_2 %>% drop_na(Q40) %>%
count(Q40, Q9) %>%
group_by(Q9)
- Which leaves me to:
### counting the unique schools by each type of 'curriculo':
b2 <- b2 %>%
select(SCHOOL, Q9) %>%
unique() %>% count(Q9)
## join dataframes and get the proportion of schools that have each type of 'planejamento'
## within the curriculum types
c <- b %>% full_join(b2, by = 'Q9') %>%
mutate(prop = round((n.x/n.y *100), digits = 2)) %>%
select(-n.x, -n.y)
Q1 = I don't think I'm filtering it correctely since I'm not exclusively filtering the schools that offer alll levels up to 01 to 04. I guess I'm doing an 'or' not an 'and'
Q2 = Is there a way to avoid the loop with
tidyverse
? Thanks in adv.Ultimately, I'm trying to get the starts to => 'do the schools plan their agenda for the levels they offer or not and how does the type of school impact this?' (later this can be modeled, but now I just need percentages)
data:
dput(quest40_2)
structure(list(SCHOOL = structure(c("School1", "School1", "School1",
"School1", "School1", "School1", "School1", "School2", "School2",
"School2", "School2", "School2", "School2", "School2", "School3",
"School3", "School3", "School3", "School3", "School3", "School3",
"School3", "School3", "School4", "School4", "School4", "School4",
"School4", "School4", "School5", "School5", "School5", "School5",
"School5", "School5", "School6", "School6", "School6", "School6",
"School6", "School7", "School7", "School7", "School7", "School7",
"School7", "School8", "School8", "School8", "School8", "School9",
"School9", "School9", "School9", "School9", "School10", "School10",
"School10", "School10", "School10", "School11", "School11", "School11",
"School11", "School11", "School11", "School11", "School11", "School11",
"School12", "School12", "School12", "School12", "School12", "School12",
"School12", "School13", "School13", "School13", "School13", "School13",
"School13", "School13", "School13", "School13", "School13", "School14",
"School14", "School14", "School15", "School15", "School15", "School15",
"School15", "School15", "School16", "School16", "School16", "School16",
"School16", "School16", "School16", "School17", "School17", "School17",
"School17", "School17", "School18", "School18", "School18", "School18",
"School18", "School19", "School19", "School19", "School19", "School19",
"School19", "School20", "School20", "School20", "School21", "School21",
"School21", "School21", "School21", "School22", "School22", "School22",
"School22", "School22", "School23", "School23", "School23", "School23",
"School23", "School23", "School24", "School24", "School24", "School24",
"School24", "School24", "School25", "School25", "School25", "School25",
"School25", "School26", "School26", "School26", "School26", "School26",
"School26", "School26", "School26", "School26", "School27", "School27",
"School27", "School27", "School27", "School27", "School27", "School28",
"School28", "School28", "School28", "School28", "School28", "School28",
"School28", "School29", "School29", "School29", "School29", "School29",
"School29", "School30", "School30", "School30", "School30", "School30",
"School30", "School30", "School30", "School30", "School30", "School30",
"School31", "School31", "School31", "School31", "School31", "School31",
"School31", "School31", "School31", "School31", "School31", "School32",
"School32", "School32", "School32", "School32", "School32", "School32",
"School32", "School32", "School32", "School32", "School33", "School33",
"School33", "School33", "School33", "School33", "School33", "School34",
"School34", "School34", "School34", "School34", "School34", "School34",
"School34", "School35", "School35", "School35", "School35", "School36",
"School36", "School36", "School36", "School36", "School37", "School37",
"School37", "School37", "School37", "School37", "School37", "School37",
"School38", "School38", "School38", "School38", "School39", "School39",
"School39", "School39", "School39", "School39"), class = c("glue",
"character")), Q9 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("typeA", "typeB", "typeC", "typeD"
), class = "factor"), Q11 = structure(c(3L, 7L, 4L, 2L, NA, NA,
NA, 7L, 4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, NA, NA, NA,
NA, 4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, NA, 7L, 4L, 2L,
1L, NA, 4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, NA, 4L, 2L, 1L, 5L,
NA, 4L, 2L, 1L, 5L, NA, 4L, 2L, 1L, 5L, NA, NA, NA, NA, NA, 7L,
4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA, NA,
5L, 6L, NA, 4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, 5L, NA, NA, NA,
7L, 4L, 2L, 1L, NA, 7L, 4L, 2L, 1L, NA, 4L, 2L, 1L, NA, NA, NA,
2L, 1L, NA, 4L, 2L, 1L, 5L, NA, 4L, 2L, 1L, 5L, NA, 7L, 4L, 2L,
1L, NA, NA, 4L, 2L, 1L, NA, NA, NA, 4L, 2L, 1L, 5L, NA, 7L, 4L,
2L, 1L, 5L, NA, NA, NA, NA, 2L, 1L, 5L, NA, NA, NA, NA, 3L, 7L,
4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, 5L, NA, NA, 7L, 4L, 2L, 1L,
5L, 6L, NA, NA, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA,
NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA, NA, NA, 7L, 4L, 2L,
1L, 5L, 6L, NA, 7L, 4L, 2L, 1L, 5L, NA, NA, NA, 3L, 7L, 4L, NA,
7L, 4L, 2L, NA, NA, 4L, 2L, 1L, 5L, NA, NA, NA, NA, 4L, 2L, 1L,
NA, 7L, 4L, 2L, 1L, 5L, NA), .Label = c("level04", "level03",
"level0", "level02", "level05", "level06", "level01"), class = "factor"),
Q40 = structure(c(NA, NA, NA, NA, 2L, 6L, 5L, NA, NA, NA,
NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA,
NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 1L, NA,
NA, NA, NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, NA, 2L, 6L, 4L,
3L, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 2L,
6L, 5L, NA, NA, NA, NA, 2L, NA, NA, NA, NA, 1L, NA, NA, NA,
2L, 6L, 5L, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
1L, NA, NA, NA, NA, 2L, 1L, NA, NA, NA, 2L, 6L, 5L, NA, NA,
NA, NA, 1L, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA, NA,
2L, 6L, 5L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA,
NA, 2L, 1L, NA, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 3L, NA,
NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 3L, NA, NA, NA, NA, NA,
NA, 2L, 6L, 5L, 4L, 3L, NA, NA, NA, NA, NA, NA, 1L, NA, NA,
NA, NA, NA, 2L, 6L, 5L, NA, NA, NA, 2L, NA, NA, NA, 2L, 6L,
NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA, NA, 1L, NA, NA, NA,
NA, NA, 1L), .Label = c("none", "plan1_level0upto02", "plan5_level_05",
"plan4_level_05", "plan3_level_04", "plan2_level_03"), class = "factor")), row.names = c(NA,
-253L), class = c("tbl_df", "tbl", "data.frame"))