1

what is the formula relating variables values1 and values2 in the following code:

values1, _ = pylab.histogram(data, bins, density = False)

values2, _ = pylab.histogram(data, bins, density = True)

?

Or put in another way, given values1 how can I get values2 thus avoiding another call to pylab.histogram ?

Thanks

tmdavison
  • 64,360
  • 12
  • 187
  • 165
mljrg
  • 4,430
  • 2
  • 36
  • 49
  • 1
    To avoid confusion, note that `pylab.histogram` is actually the `numpy` histogram function, not `matplotlib`. – tmdavison Mar 21 '16 at 19:32

1 Answers1

2

The one-dimensional density is defined as the values / length. But it's also a normalization method for histograms.

So in order to get from your original values to the density-values just divide by the total count (normalization) and the bin-width (density):

bin_width = bins[1:] - bins[:-1]
values2 = values1 / np.sum(values1) / bin_width

A quick test with a random array:

from matplotlib import pylab
import numpy as np
data = np.random.randint(0,10, 1000)
bins = np.array([0,1,2,5,11])
values1, _ = pylab.histogram(data, bins, density = False)
print(values1) 
# [ 97, 117, 278, 508]
values2, _ = pylab.histogram(data, bins, density = True)
print(values2) 
# [ 0.097, 0.117, 0.09266667, 0.08466667]
bin_width = bins[1:] - bins[:-1]
print(values1 / np.sum(values1) / bin_width) 
# [ 0.097, 0.117, 0.09266667, 0.08466667]

So it is the same for this case.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • I suppose you tested in Python 3.x ... I am in Python 2.7.11 and had to do this on the last line of code: `print values1 / (1.0 * np.sum(values1)) / bin_width` – mljrg Mar 21 '16 at 19:56
  • @mljrg -Yes I'm using python3. You haven't tagged it `python 2.x` so I assumed a solution for any python version would be ok. You could also use `from __future__ import division, print_function` to bypass the difference in behaviour. :) – MSeifert Mar 21 '16 at 20:02