randomly select observation from each frequency bin

Question

I'd like to divide my data into 100 frequency bins, and then select a random observation from each frequency bin.

I have a data frame containing words and their frequencies in a corpus, like so:

word | frequency
---- | ---------
a    | 72387
and  | 112091
that | 87164
to   | 71474
the  | 98422
etc.

I know that I can bin the data using the cut function, but I'm not sure how to then select one word randomly from each frequency bin.

Don't put answers in the question - explain specifically why the duplicate doesn't apply in the question, then you can post the answer when it's reopened. — jonrsharpe, Sep 29 '19 at 17:27

score 1 · Answer 1 · answered Sep 26 '19 at 22:47

1

A tidyverse answer would be:

d <- iris %>% 
  mutate(bin = ntile(Species, 100)) %>%
  group_by(bin) %>%
  sample_n(1) %>%
  ungroup()

You can replace "iris" with your df and "Species" with the column you would like to bin by.

answered Sep 26 '19 at 22:47

Sonali J

1 Answers1