36

I've got matplotlib installed and try to create a histogram plot from some data:

#!/usr/bin/python

l = []
with open("testdata") as f:
    line = f.next()
    f.next()  # skip headers
    nat = int(line.split()[0])
    print nat

    for line in f:
        if line.strip():
          if line.strip():
            l.append(map(float,line.split()[1:]))

    b = 0
    a = 1

for b in range(53):
    for a in range(b+1, 54):
        import operator
        import matplotlib.pyplot as plt
        import numpy as np

        vector1 = (l[b][0], l[b][1], l[b][2])
        vector2 = (l[a][0], l[a][1], l[a][2])

        x = vector1
        y = vector2
        vector3 = list(np.array(x) - np.array(y))
        dotProduct = reduce( operator.add, map( operator.mul, vector3, vector3))
    
        dp = dotProduct**.5
        print dp
    
        data = dp
        num_bins = 200  # <- number of bins for the histogram
        plt.hist(data, num_bins)
        plt.show()

I'm getting an error from the last part of the code:

/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtk.py:621:     DeprecationWarning: Use the new widget gtk.Tooltip
  self.tooltips = gtk.Tooltips()
Traceback (most recent call last):
  File "vector_final", line 42, in <module>
plt.hist(data, num_bins)
  File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 2008, in hist
ret = ax.hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, **kwargs)
  File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 7098, in hist
w = [None]*len(x)
TypeError: len() of unsized object

But anyway, do you have any idea how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Wana_B3_Nerd
  • 613
  • 3
  • 7
  • 21

3 Answers3

65

do you have any idea how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?

You can, for example, use NumPy's arange for a fixed bin size (or Python's standard range object), and NumPy's linspace for evenly spaced bins. Here are 2 simple examples from my matplotlib gallery

Fixed bin size

import numpy as np
import random
from matplotlib import pyplot as plt

data = np.random.normal(0, 20, 1000) 

# fixed bin size
bins = np.arange(-100, 100, 5) # fixed bin size

plt.xlim([min(data)-5, max(data)+5])

plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed bin size)')
plt.xlabel('variable X (bin size = 5)')
plt.ylabel('count')

plt.show()

enter image description here

Fixed number of bins

import numpy as np
import math
from matplotlib import pyplot as plt

data = np.random.normal(0, 20, 1000) 

bins = np.linspace(math.ceil(min(data)), 
                   math.floor(max(data)),
                   20) # fixed number of bins

plt.xlim([min(data)-5, max(data)+5])

plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed number of bins)')
plt.xlabel('variable X (20 evenly spaced bins)')
plt.ylabel('count')

plt.show()

enter image description here

  • 1
    I found this to be very helpful. I deleted the lines "import random" without detecting any ill effects. Is it actually needed here? I gather that we call a function called random.normal, but if I understand the script correctly, this function is part of the numpy module. – Carl Christian Jan 01 '18 at 11:47
  • Glad this was helpful! Good point, the `import random` line looked like a stale import and wasn't actually used in that code snippet. Edited in the answer. Thanks! –  Jan 01 '18 at 22:17
  • aren't math.floor and ceil in the wrong order? I would have expected: np.linspace(math.floor(min(data)), math.ceil(max(data)),20) – Zeh Nov 05 '20 at 08:10
6

Automatic bins

how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?

The accepted answer manually creates 200 bins with np.arange and np.linspace, but matplotlib already does this automatically:

  1. plt.hist itself returns counts and bins

    counts, bins, _ = plt.hist(data, bins=200)
    

Or if you need the bins before plotting:

  1. np.histogram with plt.stairs

    counts, bins = np.histogram(data, bins=200)
    plt.stairs(counts, bins, fill=True)
    

    Note that stair plots require matplotlib 3.4.0+.

  2. pd.cut with plt.hist

    _, bins = pd.cut(data, bins=200, retbins=True)
    plt.hist(data, bins)
    

    histogram output

tdy
  • 36,675
  • 19
  • 86
  • 83
3

There's a couple of ways to do this.

If you can not guarantee your items all to be the same type and numeric, then use the builtin standard library collections:

import collections
hist = dict(collections.Counter(your_list))

Otherwise if your data is guaranteed to be all the same type and numeric, then use the Python module numpy:

import numpy as np
# for one dimensional data
(hist, bin_edges) = np.histogram(your_list)
# for two dimensional data
(hist, xedges, yedges) = np.histogram2d(your_list)
# for N dimensional data
(hist, edges) = np.histogramdd(your_list)

The numpy histogram functionality is really the Cadillac option because np.histogram can do things like try to figure out how many bins you need and it can do weighting and it has all the algorithms it uses documented with lots of great documentation and example code.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177