9

I'm interested in plotting the probability distribution of a set of points which are distributed as a power law. Further, I would like to use logarithmic binning to be able to smooth out the large fluctuations in the tail. If I just use logarithmic binning, and plot it on a log log scale, such as

pl.hist(MyList,log=True, bins=pl.logspace(0,3,50))
pl.xscale('log')

for example, then the problem is that the larger bins account for more points, i.e. the heights of my bins are not scaled by bin size.

Is there a way to use logarithmic binning, and yet make python scale all the heights by the size of the bin? I know I can probably do this in some roundabout fashion manually, but it seems like this should be a feature that exists, but I can't seem to find it. If you think histograms are fundamentally a bad way to represent my data and you have a better idea, then I'd love to hear that too.

Thanks!

SarthakC
  • 193
  • 1
  • 1
  • 6
  • Do you you want a histogram of the logarithm of the data AND you want the y axis scale to be logarithmic? – wwii May 11 '16 at 19:08
  • @wwii: I want to make a histogram on a log-log scale, with a log binning as well, so that the histogram on the log-log scale appears to have uniform binsize – SarthakC May 11 '16 at 19:18
  • Sorry for a bit of off-topic self-promotion, but perhaps you might find useful my library **physt**. Among other features, it provides different binning schemes, one of which is suited for exponentially-distributed values. See http://nbviewer.jupyter.org/github/janpipek/physt/blob/master/doc/Binning.ipynb and https://github.com/janpipek/physt – honza_p May 31 '16 at 16:46

1 Answers1

12

Matplotlib won't help you much if you have special requirements of your histograms. You can, however, easily create and manipulate a histogram with numpy.

import numpy as np
from matplotlib import pyplot as plt

# something random to plot
data = (np.random.random(10000)*10)**3

# log-scaled bins
bins = np.logspace(0, 3, 50)
widths = (bins[1:] - bins[:-1])

# Calculate histogram
hist = np.histogram(data, bins=bins)
# normalize by bin width
hist_norm = hist[0]/widths

# plot it!
plt.bar(bins[:-1], hist_norm, widths)
plt.xscale('log')
plt.yscale('log')

Obviously when you do present your data in a non-obvious way like this, you have to be very careful about how to label your y axis properly and write an informative figure caption.

tjollans
  • 717
  • 4
  • 18
  • Thanks! :) That works for my purpose, though I would prefer a more direct way if it exists. For power-law like data this just seems to be the most natural way for me to represent the data. If there's no better answer involving matplotlib functionality directly in the next day or so I'll accept your answer. – SarthakC May 12 '16 at 19:22