0

I am a freshman in python, and I have a problem of how to draw a histogram in python.

First of all, I have ten intervals that are divided evenly according to the length of flowers' petal, from min to max. Thus I can separate flowers into ten intervals based on petals.

The number of flowers' kind is three, so I want to draw a histogram to describe the distribution of different kinds of flowers in different intervals(bins). And in the same bin, different flowers have different colors.

I know the hist functions in Matplotlib, but I don't know how to use it to draw pictures like below.

wanted histogram

The data are Lbins = [0.1 , 0.34, 0.58, 0.82, 1.06, 1.3 , 1.54, 1.78, 2.02, 2.26, 2.5 ] and Data_bins is an array of shape (number of flowers, 3).

Boiethios
  • 38,438
  • 19
  • 134
  • 183
周靖哲
  • 11
  • 1
  • 1
  • 2
  • 1
    Could you include in the question some code of your attempt using the `hist` function, with some data? so we can get what is not working – xdze2 Aug 26 '18 at 14:00
  • Lbins = [0.1 , 0.34, 0.58, 0.82, 1.06, 1.3 , 1.54, 1.78, 2.02, 2.26, 2.5 ] Data_bins = 2 dimensional data with flowers and bins plt.hist(Data_bins, Lbins) – 周靖哲 Aug 28 '18 at 06:40

1 Answers1

2

Here is an example of an histogram with multiple bars for each bins using hist from Matplotlib:

import numpy as np
import matplotlib.pyplot as plt

length_of_flowers = np.random.randn(100, 3)
Lbins = [0.1 , 0.34, 0.58, 0.82, 1.06, 1.3 , 1.54, 1.78, 2.02, 2.26, 2.5 ]
# Lbins could also simply the number of wanted bins

colors = ['red','yellow', 'blue']
labels = ['red flowers', 'yellow flowers', 'blue flowers']
plt.hist(length_of_flowers, Lbins,
         histtype='bar',
         stacked=False,  
         fill=True,
         label=labels,
         alpha=0.8, # opacity of the bars
         color=colors,
         edgecolor = "k")

# plt.xticks(Lbins) # to set the ticks according to the bins
plt.xlabel('flower length'); plt.ylabel('count');
plt.legend();
plt.show()

which gives:

example hist

Edit: Solution for pre-binned data inspired from this matplotlib demo. The position of each bar is custom computed. I slightly modified the data by replacing zero values to verify correct alignment.

import numpy as np
import matplotlib.pyplot as plt

binned_data = np.array([[41., 3., 3.], [ 8., 3., 3.], [ 1., 2., 2.], [ 2., 7., 3.],
                        [ 0., 20., 0.], [ 1., 21., 1.], [ 1., 2., 4.], [ 3., 4., 23.],
                        [ 0., 0., 9.], [ 3., 1., 14.]]).T

# The shape of the data array have to be:
#  (number of categories x number of bins)
print(binned_data.shape)  # >> (3, 10)

x_positions = np.array([0.1, 0.34, 0.58, 0.82, 1.06, 1.3, 1.54, 1.78, 2.02, 2.26])

number_of_groups = binned_data.shape[0]
fill_factor =  .8  # ratio of the groups width
                   # relatively to the available space between ticks
bar_width = np.diff(x_positions).min()/number_of_groups * fill_factor

colors = ['red','yellow', 'blue']
labels = ['red flowers', 'yellow flowers', 'blue flowers']

for i, groupdata in enumerate(binned_data): 
    bar_positions = x_positions - number_of_groups*bar_width/2 + (i + 0.5)*bar_width
    plt.bar(bar_positions, groupdata, bar_width,
            align='center',
            linewidth=1, edgecolor='k',
            color=colors[i], alpha=0.7,
            label=labels[i])

plt.xticks(x_positions);
plt.legend(); plt.xlabel('flower length'); plt.ylabel('count');

which gives:

example with binned data

xdze2
  • 3,986
  • 2
  • 12
  • 29
  • Thanks you so much, but if I use my data, it will be crushed. array([[41., 0., 0.], [ 8., 0., 0.], [ 1., 0., 0.], [ 0., 7., 0.], [ 0., 20., 0.], [ 0., 21., 0.], [ 0., 2., 4.], [ 0., 0., 23.], [ 0., 0., 9.], [ 0., 0., 14.]]) – 周靖哲 Aug 31 '18 at 02:14
  • I think the data is bad – 周靖哲 Aug 31 '18 at 02:15
  • there are just ten intervals, from Lbin1 to Lbin 11. Lbin is a real number. why the data, length of flowers, shoud be (100,3). why it is not (10,3)????? – 周靖哲 Aug 31 '18 at 02:17
  • @周靖哲 The `hist` function takes non processed data as input and then performs the binning operation. As your data is already binned there is no need to use `hist`. However, using the `bar` function, I think there is no other choice than to compute the position of the grouped bins "by hand", I added an example. – xdze2 Aug 31 '18 at 10:00
  • 1
    I am thinking now, that actually mixing categorical data (color of flower) with continuous data (flower length) on the same axis is confusing... I think it will better to use three line graph (with plot), one for each color, so the x-axis will solely corresponds to flower length – xdze2 Aug 31 '18 at 10:04