0

I have an array, vox_betas, that contains 21600 floats (ranging from ~0 to ~2), and when sorted by the array, features, you can see that there is a structure to the data (see 1st pic).

I want to have a single array that reflects this structure -- essentially I want to call sns.distplot() and have that produce the same plot as the first picture. Right now sns.distplot(vox_betas) depicts the 2nd picture, which is not what I want.

I was able to accomplish this in the third picture by creating the array, dist, but the way I accomplished this was sloppy and even loses some information (my code is below).

How would you transform vox_betas and features into dist? Does anyone have any ideas?

plt.scatter(features,vox_betas)

enter image description here

sns.distplot(vox_betas)

enter image description here

dist=[]
for f in np.unique(features):
    dist = np.concatenate((dist,
                np.repeat(f,
                np.sum(
                [vox_betas[j]*10 for j in np.where(features==f)[0]]))))

sns.distplot(dist)

enter image description here

Paul Scotti
  • 435
  • 3
  • 7

1 Answers1

0

This is called inverse transform sampling:

is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function.

The best explanation I found is this one. Also discussed here.

Joe
  • 6,758
  • 2
  • 26
  • 47
  • This is very similar to what I'm going after, so thanks for the note -- I had not previously heard of inverse transform sampling. However, I don't have a cumulative distribution function to transform here.. – Paul Scotti Feb 02 '20 at 00:48
  • You can create the CDF from every distribution you have, see https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#common-methods – Joe Feb 02 '20 at 05:49
  • Or there are non build-in functions: https://cmdlinetips.com/2019/05/empirical-cumulative-distribution-function-ecdf-in-python/ – Joe Feb 02 '20 at 05:51
  • https://stackoverflow.com/questions/10640759/how-to-get-the-cumulative-distribution-function-with-numpy – Joe Feb 02 '20 at 05:52
  • https://stackoverflow.com/questions/24788200/calculate-the-cumulative-distribution-function-cdf-in-python – Joe Feb 02 '20 at 05:52
  • The easiest way is probably to use `np.cumsum`. – Joe Feb 02 '20 at 05:56
  • You can also search for ECDF (empirical cumulative distribution function), e.g. https://www.youtube.com/watch?v=ap4mfGvgDsM – Joe Feb 02 '20 at 05:58
  • Thank you for the explanation of ECDF! – Paul Scotti Feb 03 '20 at 17:37