First and foremost, thank you for taking your time to view my post, regardless of if you respond or not.
So, having looking around for an answer to this question - I am unable to find any specific solution to my problem:
I have a dataset, which mirrors the data below:(please note the values in the diff columns are made up .... lazyness gets the best of us all)
id Date score lag_score diff
01 1/1/18 .26 .4367 -.674
012 1/1/18 .176 .2038 -.156
101 1/1/18 .375 .83 -1.22
56 1/1/18 .64 .24 .6178
43 1/1/18 .18 .1505 .204
... ... .. ... ...
Essentially, I have a df of many different ID's (the date stays as 1/1/18), and my goal is to create 5 new df that equally splits my df based on the Diff column (highest diff in one bin, and smallest diff in the last bin, and gradually decreasing in the middle bins). Ideally, I would like something that could automatically split my df into 5 bins, however, if there is a way to mutate a column BINS on my df, I wouldn't mind writing the subset functions after-the-fact, since my # of bins is relatively small(5) [might look something like this]
Ideally - all of the bins should have relatively the same # of ID's..
id Date score lag_score diff BINS
01 1/1/18 .26 .4367 -.674 5
012 1/1/18 .176 .2038 -.156 4
101 1/1/18 .375 .83 -1.22 5
56 1/1/18 .64 .24 .6178 1
43 1/1/18 .18 .1505 .204 2
... ... .. ... ... ...
Currently, I have the following code, however, this produces some very volatile results, and does not produce what is intended by me, might be because I am using the 2nd argument in the findInterval wrong.
df <- split(df, findInterval(df$diff, floor(min(df$diff)):0))