-2

I'd like to divide my data into 100 frequency bins, and then select a random observation from each frequency bin.

I have a data frame containing words and their frequencies in a corpus, like so:

word | frequency
---- | ---------
a    | 72387
and  | 112091
that | 87164
to   | 71474
the  | 98422
etc.

I know that I can bin the data using the cut function, but I'm not sure how to then select one word randomly from each frequency bin.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
dwhieb
  • 1,706
  • 19
  • 29
  • Don't put answers in the question - explain specifically why the duplicate doesn't apply in the question, then you can post the answer when it's reopened. – jonrsharpe Sep 29 '19 at 17:27

1 Answers1

1

A tidyverse answer would be:

d <- iris %>% 
  mutate(bin = ntile(Species, 100)) %>%
  group_by(bin) %>%
  sample_n(1) %>%
  ungroup()

You can replace "iris" with your df and "Species" with the column you would like to bin by.

Sonali J
  • 68
  • 8