2

I am generating histograms with matplotlib.

I need the bins to be of unequal width as I'm mostly interested in the lowest bins. Right now I'm doing this:

plt.hist(hits_array, bins = (range(0,50,10) + range(50,550,50)))

This creates what I want (the first 5 bins have a width of 10, the rest of 50), but the first five bins are, of course, narrower than the latter ones, as all bins are displayed on the same axis.

Is there a way to influence the x-axis or histogram itself so I can break the scale after the first 5 bins, so all bins are displayed as equally wide?

(I realize that this will create a distorted view, and I'm fine with that, though I wouldn't mind a bit of space between the two differently scaled parts of the axis.)

Any help will be greatly appreciated. Thanks!

CodingCat
  • 4,999
  • 10
  • 37
  • 59

3 Answers3

5

I had a similar question here, and the answer was to use a dirty hack. Matplotlib histogram with collection bin for high values

So with the following code, you get the ugly histogram you already have.

def plot_histogram_04():
    limit1, limit2 = 50, 550
    binwidth1, binwidth2 = 10, 50    
    data = np.hstack((np.random.rand(1000) * limit1, np.random.rand(100) * limit2))

    bins = range(0, limit1, binwidth1) + range(limit1, limit2, binwidth2)

    plt.subplots(1, 1)
    plt.hist(data, bins=bins)
    plt.savefig('my_plot_04.png')
    plt.close()

enter image description here

In order to make the bins equal width, you indeed have to make them equal width! This means manipulating your data such that they all fall in bins with equal width, and then play around with the xlabel.

def plot_histogram_05():
    limit1, limit2 = 50, 550
    binwidth1, binwidth2 = 10, 50

    data = np.hstack((np.random.rand(1000) * limit1, np.random.rand(100) * limit2))

    orig_bins = range(0, limit1, binwidth1) + range(limit1, limit2 + binwidth2, binwidth2)
    data = [(i - limit1) / (binwidth2 / binwidth1) + limit1 
            if i >= limit1 else i for i in data]
    bins = range(0, limit2 / (binwidth2 / binwidth1) + limit1, binwidth1)

    _, ax = plt.subplots(1, 1)
    plt.hist(data, bins=bins)

    xlabels = np.array(orig_bins, dtype='|S3')
    N_labels = len(xlabels)
    print xlabels
    print bins
    plt.xlim([0, bins[-1]])
    plt.xticks(binwidth1 * np.arange(N_labels))
    ax.set_xticklabels(xlabels)

    plt.savefig('my_plot_05.png')
    plt.close()

enter image description here

Community
  • 1
  • 1
physicalattraction
  • 6,485
  • 10
  • 63
  • 122
2

You can use bar and there is no need to split the axis. Here is an example,

import matplotlib.pylab as plt
import numpy as np

data = np.hstack((np.random.rand(1000)*50,np.random.rand(100)*500))
binwidth1,binwidth2=10,50
bins=range(0,50,binwidth1)+range(50,550,binwidth2)

fig,(ax) = plt.subplots(1, 1)

y,binEdges=np.histogram(data,bins=bins)

ax.bar(0.5*(binEdges[1:]+binEdges[:-1])[:5], y[:5],width=.8*binwidth1,align='center')
ax.bar(0.5*(binEdges[1:]+binEdges[:-1])[5:], y[5:],width=.8*binwidth1,align='center')
plt.show()

enter image description here

In case you really want to split the axis have a look here.

Community
  • 1
  • 1
imsc
  • 7,492
  • 7
  • 47
  • 69
1
import pandas as pd
import numpy as np

df= data

bins = np.arange(0,0.1,0.001)
df.hist(bins=bins,color='grey')
danday74
  • 52,471
  • 49
  • 232
  • 283
TVC
  • 67
  • 4
  • 2
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – cheersmate Oct 24 '18 at 11:07
  • I would tend to agree with this comment, but this answer is very nice, does exactly what is asked, and is also very simple to understand if you ever used the bins argument. – charelf Apr 14 '21 at 08:06