0

I am trying to automate a frequency diagram with matplotlib in Python in order to count occurences, instead of having to manually plot in Excel. However, I am not able to make an similar as possible diagram as I have done in Excel. Is this possible with Matplotlib?

In Excel:

enter image description here

Code:

#!/usr/bin/python

import numpy as np
import matplotlib.pyplot as plt
from numpy import *
import os
import sys
import csv
from random import randint

x = [6,0,0,26,0,0,0,0,5,0,7,0,12,12,0,0,0,3,0,5,5,0,10,4,3,5,1,0,2,0,0,1,0,8,0,3,7,1,0,0,0,1,1,0,0,0,0,0,7,16,0,0,0,5]


plt.hist(x)
plt.title("Frequency diagram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

Result (The readability is not as good compared to Excel, how can I make it as similar as the excel graph):

enter image description here

2 Answers2

3
import numpy as np
import matplotlib.pyplot as plt

def make_hist(ax, x, bins=None, binlabels=None, width=0.85, extra_x=1, extra_y=4, 
              text_offset=0.3, title=r"Frequency diagram", 
              xlabel="Values", ylabel="Frequency"):
    if bins is None:
        xmax = max(x)+extra_x
        bins = range(xmax+1)
    if binlabels is None:
        if np.issubdtype(np.asarray(x).dtype, np.integer):
            binlabels = [str(bins[i]) if bins[i+1]-bins[i] == 1 else 
                         '{}-{}'.format(bins[i], bins[i+1]-1)
                         for i in range(len(bins)-1)]
        else:
            binlabels = [str(bins[i]) if bins[i+1]-bins[i] == 1 else 
                         '{}-{}'.format(*bins[i:i+2])
                         for i in range(len(bins)-1)]
        if bins[-1] == np.inf:
            binlabels[-1] = '{}+'.format(bins[-2])
    n, bins = np.histogram(x, bins=bins)
    patches = ax.bar(range(len(n)), n, align='center', width=width)
    ymax = max(n)+extra_y

    ax.set_xticks(range(len(binlabels)))
    ax.set_xticklabels(binlabels)

    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.set_ylim(0, ymax)
    ax.grid(True, axis='y')
    # http://stackoverflow.com/a/28720127/190597 (peeol)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
    # http://stackoverflow.com/a/11417222/190597 (gcalmettes)
    ax.xaxis.set_ticks_position('none')
    ax.yaxis.set_ticks_position('none')
    autolabel(patches, text_offset)

def autolabel(rects, shift=0.3):
    """
    http://matplotlib.org/1.2.1/examples/pylab_examples/barchart_demo.html
    """
    # attach some text labels
    for rect in rects:
        height = rect.get_height()
        if height > 0:
            plt.text(rect.get_x()+rect.get_width()/2., height+shift, '%d'%int(height),
                     ha='center', va='bottom')

x = [6,0,0,26,0,0,0,0,5,0,7,0,12,12,0,0,0,3,0,5,5,0,10,4,3,5,1,0,2,0,0,1,0,8,0,
     3,7,1,0,0,0,1,1,0,0,0,0,0,7,16,0,0,0,5,41]
fig, ax = plt.subplots(figsize=(14,5))
# make_hist(ax, x)
# make_hist(ax, [1,1,1,0,0,0], extra_y=1, text_offset=0.1)
make_hist(ax, x, bins=list(range(10))+list(range(10,41,5))+[np.inf], extra_y=6)
plt.show()

enter image description here

make_hist attempts to identify if all the values in x are integers. If so, it uses integer-based bin labels. For example, the bin label 10-14 represents the range [10, 14] (inclusive).

If, on the other hand, x contains floats, then make_hist will use half-open float-based bin labels. For example, 10-15 would represent the half-open range [10, 15).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Wow! thanks, that a lot of code. However, when I change the values of the list to for example `x = [1,1,1,0,0,0]` the diagram is not showing correct? Is it dynamic? Since the values will be different. –  Jul 31 '16 at 18:26
  • I was off-by-one in my definition of `bins`. Perhaps try it again. – unutbu Jul 31 '16 at 18:37
  • Thanks. However, I am wondering why it is not showing any values above 35? Is it possible to for example have the last colum as `40+` where values above 40 is a part of? Since the `x` values may change and are not static. Thanks a lot for your help. –  Jul 31 '16 at 19:06
  • To get those exact labels, I think you would need to define the `bins` and `binlabels` manually. I've edited the post to show how. – unutbu Jul 31 '16 at 19:13
  • Thanks man! I really appriciete your effort. However by adding `41` in the list its not appearing in the graph. Is it possible to push 0 to the left to make room for it? Result(https://www.dropbox.com/s/lsnf38svk6laydi/2016-07-31%2021_26_31-Figure%201.png?dl=0) –  Jul 31 '16 at 19:26
  • The `bins` define the edges of the histogram bins. So if you add some really huge number (like `np.inf`) as the last number in `bins`, then all numbers greater than 40 will be counted in the last bin. – unutbu Jul 31 '16 at 19:34
  • Thanks! How can I increase the font-size of the labels ? –  Aug 01 '16 at 17:51
  • You can set the `labelsize` with `plt.tick_params(axis='both', which='major', labelsize=16)`. See Autiwa's answer [here](http://stackoverflow.com/a/11386056/190597). – unutbu Aug 01 '16 at 18:01
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/118839/discussion-between-user3580316-and-unutbu). –  Aug 01 '16 at 18:05
0

Matplotlib does support styling. You may prefer the ggplot style:

plt.style.use('ggplot')

There are many other premade styles, or you can create your own: http://matplotlib.org/users/style_sheets.html

Chris
  • 391
  • 1
  • 4