subsetting df into n-bins based on column values

Question

First and foremost, thank you for taking your time to view my post, regardless of if you respond or not.

So, having looking around for an answer to this question - I am unable to find any specific solution to my problem:

I have a dataset, which mirrors the data below:(please note the values in the diff columns are made up .... lazyness gets the best of us all)

id     Date     score     lag_score    diff
01    1/1/18    .26        .4367       -.674
012   1/1/18    .176       .2038       -.156
101   1/1/18    .375       .83         -1.22
56    1/1/18    .64        .24         .6178
43    1/1/18    .18        .1505       .204
...     ...      ..        ...         ...

Essentially, I have a df of many different ID's (the date stays as 1/1/18), and my goal is to create 5 new df that equally splits my df based on the Diff column (highest diff in one bin, and smallest diff in the last bin, and gradually decreasing in the middle bins). Ideally, I would like something that could automatically split my df into 5 bins, however, if there is a way to mutate a column BINS on my df, I wouldn't mind writing the subset functions after-the-fact, since my # of bins is relatively small(5) [might look something like this]

Ideally - all of the bins should have relatively the same # of ID's..

 id     Date     score     lag_score    diff   BINS
01    1/1/18    .26        .4367       -.674    5
012   1/1/18    .176       .2038       -.156    4
101   1/1/18    .375       .83         -1.22    5
56    1/1/18    .64        .24         .6178    1
43    1/1/18    .18        .1505       .204     2
...     ...      ..        ...         ...     ...

Currently, I have the following code, however, this produces some very volatile results, and does not produce what is intended by me, might be because I am using the 2nd argument in the findInterval wrong.

df <- split(df, findInterval(df$diff, floor(min(df$diff)):0))

You're looking for the function `cut`, probably used in conjunction with the function `quantile` to define the breaks. — joran, Oct 22 '18 at 19:39
@joran thank you , sorry about the duplication.. I will check those questions out and get back to you. Thank you again — yungpadewon, Oct 22 '18 at 19:42

subsetting df into n-bins based on column values

0 Answers0