How to deselect based on condition (R)

Question

I have a dataset that looks at college enrollment. I'm trying to find the proportion of students enrolled in biology per institute. I find the enrollment(EFTOTLT) for each school first using:

    #find sum of students by school
    total_enrollment <- school_data_unit_cip %>%
    group_by(UNITID) %>%
    summarise(Freq = sum(EFTOTLT))

This yields a tibble that's 2,207 x 2, then I find the enrollment for Biology for each school using:

    #find total biology enrollment by school
    total_biol_enrollment <- school_data_unit_cip %>%
    group_by(UNITID) %>%
    filter(CIPCODE == "26") %>%
    summarise(Freq = sum(EFTOTLT))

Then I realize this yields a tibble that's 1,560 x 2. So there are obviously schools that don't offer biology or don't have biology students.

Is there a way to deselect schools from the first tibble that don't have the CIPCODE 26? Or I guess is there a way to remove schools from the first list that don't exist in the second list?

Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Mar 08 '22 at 19:41

ReneSch78 · Answer 1 · 2022-03-08T20:13:04.300

0

updated after the remarks in the other answer.

i think you can filter them out if you group first, but don't no for sure without the data:

total_biol_enrollment <- school_data_unit_cip %>%
    group_by(UNITID) %>% 
    filter(!any(CIPCODE== "26"))

edited Mar 08 '22 at 20:13

answered Mar 08 '22 at 19:33

ReneSch78

21
2

score 0 · Answer 2 · answered Mar 08 '22 at 19:40

0

Without sample data it's a guess, but ... assuming that each school may have more than one CIPCODE, and you want only schools that contain at least CIPCODE == "26", then perhaps

school_data_unit_cip %>%
  filter(! "26" %in% CIPCODE)

answered Mar 08 '22 at 19:40

r2evans

141,215
6
77
149

Yeah unfortunately i did a tragic job of explaining. Essentially, there are tens of thousands of rows because each observation is subcategorized by various demographics. So by filtering out an observation without a 26 CIPCODE I could just be eliminating a demographic at a school from the list. – Ryan O'Toole Mar 08 '22 at 19:51
If I can make a list of the UNITID's from the second tibble, I guess I'm wondering if I can use that to my advantage by only filtering those schools from the original list? – Ryan O'Toole Mar 08 '22 at 19:52
I really don't know, and would prefer to not speculate without knowing your data. – r2evans Mar 08 '22 at 19:56
No worries, appreciate your help. It's a pretty low-stakes analysis, was supposed to do it in excel for a class, but I'm self-teaching R so I thought I'd try. – Ryan O'Toole Mar 08 '22 at 19:58

How to deselect based on condition (R)

2 Answers2