0

I have an array with probability values stored in it. Some values are 0. I need to plot a histogram such that there are equal number of elements in each bin. I tried using matplotlibs hist function but that lets me decide number of bins. How do I go about plotting this?(Normal plot and hist work but its not what is needed)

I have 10000 entries. Only 200 have values greater than 0 and lie between 0.0005 and 0.2. This distribution isnt even as 0.2 only one element has whereas 2000 approx have value 0.0005. So plotting it was an issue as the bins had to be of unequal width with equal number of elements

duckvader
  • 71
  • 13

1 Answers1

2

The task does not make much sense to me, but the following code does, what i understood as the thing to do.

I also think the last lines of the code are what you really wanted to do. Using different bin-widths to improve visualization (but don't target the distribution of equal amount of samples within each bin)! I used astroml's hist with method='blocks' (astropy supports this too)

Code

# Python 3 -> beware the // operator!

import numpy as np
import matplotlib.pyplot as plt
from astroML import plotting as amlp

N_VALUES = 1000
N_BINS = 100

# Create fake data
prob_array = np.random.randn(N_VALUES)
prob_array /= np.max(np.abs(prob_array),axis=0)  # scale a bit

# Sort array
prob_array = np.sort(prob_array)

# Calculate bin-borders,
bin_borders = [np.amin(prob_array)] + [prob_array[(N_VALUES // N_BINS) * i] for i in range(1, N_BINS)] + [np.amax(prob_array)]

print('SAMPLES: ', prob_array)
print('BIN-BORDERS: ', bin_borders)

# Plot hist
counts, x, y = plt.hist(prob_array, bins=bin_borders)
plt.xlim(bin_borders[0], bin_borders[-1] + 1e-2)
print('COUNTS: ', counts)
plt.show()


# And this is, what i think, what you really want

fig, (ax1, ax2) = plt.subplots(2)
left_blob = np.random.randn(N_VALUES/10) + 3
right_blob = np.random.randn(N_VALUES) + 110
both = np.hstack((left_blob, right_blob))  # data is hard to visualize with equal bin-widths

ax1.hist(both)
amlp.hist(both, bins='blocks', ax=ax2)
plt.show()

Output

enter image description here enter image description here

sascha
  • 32,238
  • 6
  • 68
  • 110
  • To clarify the question. I have 10000 entries. Only 200 have values greater than 0 and lie between 0.0005 and 0.2. This distribution isnt even as 0.2 only one element has whereas 2000 approx have value 0.0005. So plotting it was an issue as the bins had to be of unequal width with equal number of elements. – duckvader Aug 31 '16 at 20:27
  • The above plots unequal bins with an equal number of elements, but i doubt that's useful at all. Your question was really badly phrased. I now assume, you want a classic histogram (without the constraint of an equal number of elements in each bin; but unequal-bins to improve visualization quality. If that's the case, you should have said so.... It's kind of sad to do needless work because of bad phrasing, but well... Look at the second approach, using **astroml** (this is also implemented in **astropy**). – sascha Aug 31 '16 at 20:34
  • Sorry for the phrasing. Thanks – duckvader Aug 31 '16 at 20:38
  • 1
    @duckvader Sooo... i updated the code to show the different behaviour between the classic histogram and astroml's hist with ```method='blocks'```! Is this, what you wanted to achieve? – sascha Aug 31 '16 at 20:45
  • Ya, This is the what I was aiming for. I'll adapt the logic to my problem. – duckvader Aug 31 '16 at 20:47
  • 2
    Alright. Google **bayesian blocks histogram** for more information (and to understand if it's a valid approach for your problem). – sascha Aug 31 '16 at 20:49