0

For an array of values between 0 and 1, I want to create a histogram of 5 bins where bin one show the frequency(# of times) numbers between 0-0.2 show up in the array, bin 2 shows the frequency of numbers between 0.2-0.4, bin 3 is 0.4-0.6, bin 4: 0.6-0.8, bin 5 0.8-1.

import numpy as np
arr = np.array([0.5, 0.1, 0.05, 0.67, 0.8, 0.9, 1, 0.22, 0.25])
y, other_stuff = np.histogram(arr, bins=5)
x = range(0,5)
graph = plt.bar(x,height=y)
plt.show()
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Sam
  • 1,765
  • 11
  • 82
  • 176
  • What is your issue exactly? One issue is If 0 and 1 aren't in the data, it won't be the bins you expect. You'll want to explicitly define your bins (or range) since they will be determined by the min and max of your data. – busybear Nov 21 '18 at 01:15
  • In case you want to use a bar plot, see e.g. [this answer](https://stackoverflow.com/a/44003868/4124317). – ImportanceOfBeingErnest Nov 21 '18 at 02:05

2 Answers2

3

I think you are looking for matplotlib's hist method.

With your sample array the code would look like:

import matplotlib.pyplot as plt

plt.hist(arr, bins=np.linspace(0,1,6), ec='black')

enter image description here

LaSul
  • 2,231
  • 1
  • 20
  • 36
Zito Relova
  • 1,041
  • 2
  • 13
  • 32
  • I don't want to specify the number of bins because the number of bins is variable, in this example I only have 5 but I could 30, or 40, or 1000. The graph in this picture is an example of what I want, but I might have 5 graphs like this beside eachother, so ideally I want each array index of y to be a bar – Sam Nov 21 '18 at 12:37
  • Ignore everything after the last comma in my last sentence – Sam Nov 21 '18 at 12:48
-2

Is it what you are after ?

import numpy
from numpy.random import random
import matplotlib.pyplot as plt
arr = random(100)
y, other_stuff = numpy.histogram(arr, bins=5)
x = numpy.linspace(0.1, 0.9, 5)
graph = plt.bar(x, height=y, width=0.2, edgecolor='black')
plt.show()

As pointed out in the comment below, the snippet above does not define the edges of the bins in the call to histogram(). The one below corrects for that.

import numpy
import matplotlib.pyplot as plt
arr = numpy.array([0.5, 0.1, 0.05, 0.67, 0.8, 0.9, 1, 0.22, 0.25])
y, other_stuff = numpy.histogram(arr, bins=numpy.linspace(0, 1, 6))
graph = plt.bar(numpy.linspace(0.1, 0.9, 5), height=y, width=0.2,
                edgecolor='black')
plt.show()
Patol75
  • 4,342
  • 1
  • 17
  • 28
  • 1
    Attention! this only looks like it's working correctly. And that only by coincidence. You will need to define the bins as in the other answer. – ImportanceOfBeingErnest Nov 21 '18 at 01:53
  • What do you mean ? The call to histogram() gives back 5 values as per bins=5, and the call to bar() specifies the location of the center of the bins (through x and the default keyword align='center') as well as their total width. I agree hist() does the job way better (and I have upvoted the answer), but I do not see why my answer would be only coincidentally working ? I just used and improved the snippet provided by the OP. – Patol75 Nov 21 '18 at 04:42
  • 1
    If you take 100 random points between 0 and 1 and put them into 5 bins, the likelyhood is pretty high that those bins are close to `[0,0.2), [0.2,0.4), etc`. However, it is an assumption which is in general not true, and will lead to wrong results if you took e.g. the data provided in the question. Your approach would then give [this result](https://i.stack.imgur.com/zx95b.png), which is obviously different from @user7374610's result and you may easily count the values by hand to see that it's not correctly binning the values. – ImportanceOfBeingErnest Nov 21 '18 at 04:57