-2
table(mtcars$cyl)

 4  6  8 
11  7 14 

Suppose I wanted to filter low frequency terms, in this case less than 10. Is there an elegant dplyr esque way to do this?

mtcars %>% group_by(cyl) %>% filter([???])

The result would be a data frame with 4 and 8 cyl only, since they both occur 10 or more times.

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • 1
    Possible duplicate of [Return df with a columns values that occur more than once](https://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-than-once), Also [Returning observations that only occur once in a group in R](https://stackoverflow.com/questions/36145061/returning-observations-that-only-occur-once-in-a-group-in-r) and [Subset data frame based on number of rows per group](https://stackoverflow.com/questions/20204257/subset-data-frame-based-on-number-of-rows-per-group) – Ronak Shah Dec 19 '17 at 05:02
  • What's the protocol here? I would delete since People are downvoting however the question has been answered, so that would be unfair to that person. Also, I did a Google search using keyword "frequency" when first attempting to solve this which did not return any of the above answers, so who knows, maybe this question will help people searching who use that term – Doug Fir Dec 19 '17 at 06:50

1 Answers1

3

Group by cyl, count the rows, filter, optionally remove the freq column:

library(dplyr)
mtcars %>% 
  group_by(cyl) %>% 
  mutate(freq = n()) %>% 
  ungroup() %>% 
  filter(freq > 9) %>%
  select(-freq)
neilfws
  • 32,751
  • 5
  • 50
  • 63