4

Given a dataset with a non-uniform distribution (highly peaked) I want to resample to create a new dataset with an approximately uniform distribution. My approach:

  1. Divide the data into bins.
  2. Target bin level = Smallest number of samples per bin, among all bins.
  3. Randomly delete samples until each bin count = target bin level.

Is there a better technique?

Ron Cohen
  • 2,815
  • 5
  • 30
  • 45

1 Answers1

0

We know that for a uniform distribution we have

mean = (a+b) / 2

variance = (b-a)^2 / 12

So you could just construct these and sample from a uniform distribution with these parameters, where you either set a = min(data) and b = max(data) or maybe a = mean(lowest_bin) and b = mean(highest_bin) or something like that. How you want to set a and b depends on your data and what you want to accomplish

digestivee
  • 690
  • 1
  • 8
  • 16
  • For simplicity let's assume my data has mean = 0 and ranges from -1 to +1. It sounds like you are saying to choose random samples from a uniform distribution between -1 and +1. But such samples do not correspond with samples in my data. Are you saying to choose the random samples from the uniform distribution and then choose samples from my data that are closest to the values pulled from the uniform distribution? – Ron Cohen Aug 29 '17 at 14:47
  • Hmm if you still want the original data sampled then it is probably better to do kind of like you've done.Let's pretend you have 3 bins. The first contains 1 item, the second contains 2 items and the third contains 3 items. Then I would make sure that each bin had probability 1/3 - so the item in bin1 has P = 1/3, the two items in bin2 has P = 1/6 (so together they have 1/3), and the items in bin3 have P = 1/9. This way you don't need to remove data points, you simply weight them so we choose an item from each bin with the same probability which should give approximately uniform distribution. – digestivee Aug 30 '17 at 06:17
  • This answer does not seem to actually address the question. Why was it chosen? – Joshua Dempster Jun 17 '20 at 21:41