2

This is a follow up of this question. How can I filter all participants (in this case, schools) that matches ALL factors in my desired subset ? Not one or another.

  • edit = subset was up to Level 04 (1 up to 4), not Level 05

  • my dataframe looks like this:

quest40_2[1:10,]
# A tibble: 10 x 4
   SCHOOL  Q9    Q11     Q40               
   <glue>  <fct> <fct>   <fct>             
 1 School1 typeB level0  NA                
 2 School1 typeB level01 NA                
 3 School1 typeB level02 NA                
 4 School1 typeB level03 NA                
 5 School1 typeB NA      plan1_level0upto02
 6 School1 typeB NA      plan2_level_03    
 7 School1 typeB NA      plan3_level_04    
 8 School2 typeB level01 NA                
 9 School2 typeB level02 NA                
10 School2 typeB level03 NA     
  • edit for clarification: desired output:
A tibble: 12 x 4
   SCHOOL  Q9    Q11     Q40  
   <glue>  <fct> <fct>   <fct>
 1 School2 typeB level01 NA   
 2 School2 typeB level02 NA   
 3 School2 typeB level03 NA   
 4 School2 typeB level04 NA   
 5 School3 typeB level01 NA   
 6 School3 typeB level02 NA   
 7 School3 typeB level03 NA   
 8 School3 typeB level04 NA   
 9 School5 typeC level01 NA   
10 School5 typeC level02 NA   
11 School5 typeC level03 NA   
12 School5 typeC level04 NA 
  • Question I want to subset all SCHOOLs that have level01, level02 level03 AND level04 in Q11 levels (such as the operator &&, for example). Hence, I cannot have schools that offer one of them, but not all. Any ideas?(preferably, tidyverse ones)

  • data is in the linked post above

Larissa Cury
  • 806
  • 2
  • 11

2 Answers2

1

We can use filter the 'SCHOOL' having all the custom levels in 'Q11' (With dplyr 1.1, can use .by in filter

library(dplyr) # version >= 1.1.0
quest40_2 %>%
    filter(all(c("level01", "level02", "level04", "level05") %in% Q11),
     .by = "SCHOOL")

-output

# A tibble: 96 × 4
   SCHOOL  Q9    Q11     Q40               
   <glue>  <fct> <fct>   <fct>             
 1 School3 typeB level01 <NA>              
 2 School3 typeB level02 <NA>              
 3 School3 typeB level03 <NA>              
 4 School3 typeB level04 <NA>              
 5 School3 typeB level05 <NA>              
 6 School3 typeB <NA>    plan1_level0upto02
 7 School3 typeB <NA>    plan2_level_03    
 8 School3 typeB <NA>    plan3_level_04    
 9 School3 typeB <NA>    plan4_level_05    
10 School5 typeC level01 <NA>              
# … with 86 more rows

If we want to further filter with only those levels

quest40_2 %>%
    filter(all(c("level01", "level02", "level04", "level05") %in% 
     Q11), Q11 %in% c("level01", "level02", "level04", "level05"), .by = 'SCHOOL')

-output

# A tibble: 44 × 4
   SCHOOL   Q9    Q11     Q40  
   <glue>   <fct> <fct>   <fct>
 1 School3  typeB level01 <NA> 
 2 School3  typeB level02 <NA> 
 3 School3  typeB level04 <NA> 
 4 School3  typeB level05 <NA> 
 5 School5  typeC level01 <NA> 
 6 School5  typeC level02 <NA> 
 7 School5  typeC level04 <NA> 
 8 School5  typeC level05 <NA> 
 9 School13 typeD level01 <NA> 
10 School13 typeD level02 <NA> 
# … with 34 more rows

For earlier versions, use group_by

quest40_2 %>%
  group_by(SCHOOL) %>%
   filter(all(c("level01", "level02", "level04", "level05") %in% Q11)) %>%
  ungroup

-output

# A tibble: 96 × 4
   SCHOOL  Q9    Q11     Q40               
   <glue>  <fct> <fct>   <fct>             
 1 School3 typeB level01 <NA>              
 2 School3 typeB level02 <NA>              
 3 School3 typeB level03 <NA>              
 4 School3 typeB level04 <NA>              
 5 School3 typeB level05 <NA>              
 6 School3 typeB <NA>    plan1_level0upto02
 7 School3 typeB <NA>    plan2_level_03    
 8 School3 typeB <NA>    plan3_level_04    
 9 School3 typeB <NA>    plan4_level_05    
10 School5 typeC level01 <NA>              
# … with 86 more rows

Instead of filtering the 'SCHOOL's if we need to only the filter only the custom levels, just do

quest40_2 %>%
    filter(Q11 %in% c("level01", "level02", "level04", "level05"))

-output

# A tibble: 115 × 4
   SCHOOL  Q9    Q11     Q40  
   <glue>  <fct> <fct>   <fct>
 1 School1 typeB level01 <NA> 
 2 School1 typeB level02 <NA> 
 3 School2 typeB level01 <NA> 
 4 School2 typeB level02 <NA> 
 5 School2 typeB level04 <NA> 
 6 School3 typeB level01 <NA> 
 7 School3 typeB level02 <NA> 
 8 School3 typeB level04 <NA> 
 9 School3 typeB level05 <NA> 
10 School4 typeB level02 <NA> 
# … with 105 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662
  • hi! @akrun , thank you, but why am I getting NAs in all ```Q40s``` values? – Larissa Cury Feb 18 '23 at 00:33
  • Hi, @akrun! another thing. I'm trying to filter the schools that has all levels in the subset, when I tried your code, I got ```School13``` and ```School28```, for example, which also has ```level 06``` – Larissa Cury Feb 18 '23 at 10:28
  • @LarissaCury We are only `filter`ing based on the Q11 values, so the NAs are already there in the input data – akrun Feb 18 '23 at 13:43
  • @LarissaCury Regarding the second question, my understanding was that you want to subset SCHOOL that have all the custom levels you mentioned. I didn't know that you want to exclude those having extra levels. Can you please clarify that part? Also, if you can show what also SCHOOL should be in the expected, it will be helpful – akrun Feb 18 '23 at 13:46
  • @LarissaCury if you are futher filtering, you get 44 rows as in the updsate – akrun Feb 18 '23 at 14:47
  • hi, @arkun ! Sure. I need to subset the ```schools``` that only offer all the required levels. Hence: 1) I cannot have schools that have either of them, but rather I need only the ones which offer *all* of them and 2) I cannot have schools that have levels below or above the required ones in the subset, only the schools that *only* have the subset required levels. Is it clear now? please let me know and thanks for the help! – Larissa Cury Feb 18 '23 at 15:12
  • oh, and I cannot filter only the custom ```levels``` because I need to exclude the ```schools``` which do not meet the criteria from the subset – Larissa Cury Feb 18 '23 at 15:13
  • @LarissaCury Can you check the updated output. If I do the filter with only 4 levels and those particular levels only found in the SCHOOL, there are none in the example, as all the SCHOOL have either more or less – akrun Feb 18 '23 at 15:14
  • @LarissaCury Can you update your post with the expected output as it is not clear what you wanted exaclty – akrun Feb 18 '23 at 15:16
  • 1
    yes!! that was what I needed! The strange thing is the q40 column, it should contain NAs, I'll investigate that...Thank you very much, I'll edit the post! – Larissa Cury Feb 18 '23 at 15:26
  • 1
    @LarissaCury those NAs could be an input issue as we are not sorting the rows of columns separately here – akrun Feb 18 '23 at 15:27
1

we can make a vector of Q11 values to keep and use for filter() fanction from dplyr.

library(dplyr)
tokeep<-c("level01", "level02", "level04", "level05")

quest40_2 %>% group_by(SCHOOL) %>%
 filter(Q11 %in% tokeep) %>% ungroup()

few rows of the output:

# A tibble: 115 × 4
   SCHOOL  Q9    Q11     Q40  
   <glue>  <fct> <fct>   <fct>
 1 School1 typeB level01 NA   
 2 School1 typeB level02 NA   
 3 School2 typeB level01 NA   
 4 School2 typeB level02 NA   
 5 School2 typeB level04 NA   
 6 School3 typeB level01 NA   
 7 School3 typeB level02 NA   
 8 School3 typeB level04 NA   
 9 School3 typeB level05 NA   
10 School4 typeB level02 NA   
# … with 105 more rows
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
  • hi , @SALAR , thanks! why are we getting NAs in all ```Q40s``` values? – Larissa Cury Feb 18 '23 at 00:47
  • If you check the original data, most of values present in Q11 are missing in Q40 and vice versa. So, most of values would be droped from the filtered data because you are filtering only for specific values in Q11 which are actually missing in Q40. Hope it could helps@LarissaCury – S-SHAAF Feb 18 '23 at 00:54
  • thanks for the reply, but I have just noticed another thing. See, I guess that your code is doing the same thing as mine, meaning: we're capturing all schools that offer any level between ```level 01``` and ```level 05```, but not the schools that offer *all* levels from 01 to 05. Any ideas?v Ex. School 01 and 02 shouldn't be considered – Larissa Cury Feb 18 '23 at 10:51