-1

i have a Pandas dataframe, which contains 6000 values ranging between 1 and 2500, i would like to create a chart that shows a predetermined x-axis, i.e. [1,2,4,8,16,32,64,128,256,512,more] and the a bar for each of these counts, i've been looking into the numpy.histogram, bit that does not let me choose the bin range (it estimates one) same goes for matplotlib.

The codes i've tried so far is,

plt.hist(df['cnt'],bins=[0,1,2,4,8,16,32,64,128,256,512])
plt.show()

np.histogram(df['cnt'])

And the plotting the np data, but i does not look like i want it.

I hope my question makes sense, else i will try to expand.

EDIT when i run the

plt.hist(df['cnt'],bins=[0,1,2,4,8,16,32,64,128,256,512])
plt.show()

i get:

enter image description here

What i want:

enter image description here

Where the second one have been made in Excel using the data analysis histogram function. I hope this gives a better picture of what i would like to do.

Community
  • 1
  • 1
Steffen Hvid
  • 187
  • 1
  • 10
  • `numpy.histogram` as well as `pyplot.hist` let you chose the bins via their `bins` argument. So the approach is in principle correct. You would want to explain exactly what the problem is using the `bins` argument. "but i does not look like i want it." is not a proper problem description. You may also consider posting an image of the issue, which might make it easier to explain ("In the image you see __, however I would like to see __"). – ImportanceOfBeingErnest Aug 22 '17 at 09:55
  • 1
    Not clear what exactly are you looking for. I guess [this answer](https://stackoverflow.com/a/5328669/5864582) might be of help. – akilat90 Aug 22 '17 at 10:36
  • Are the values contained in `df['cnt']` only those values listed in `bins`? If not, the excel chart would not make much sense, but just to make sure we are talking about the same thing here. – ImportanceOfBeingErnest Aug 22 '17 at 11:22
  • In excel you apply a function and it generates two columns, one with predetermined bins and one with the count. Then i make a bar plot of (bin, count) - But it looks like plt.hist() is making the right chart, i might just be something with the grouping and formating – Steffen Hvid Aug 22 '17 at 11:30
  • That did not answer my question. Are the values contained in `df['cnt']` only those values listed in `bins`? My point is that the excel chart does not allow to know the bin edges. In the plot from @tom's answer below it is e.g. clear that the first bin ranges from 1 to 2, in the excel plot you don't know if it is from 0 to 1 or from 0.5 to 1.5 or whatever else. This would not be a problem in case there are only values at 1,2,4, etc in the `df['cnt']` column. – ImportanceOfBeingErnest Aug 22 '17 at 11:37
  • The data only contains Integers, and all strictly positive. As far as i know Excel, takes up to and including one in the first bar. So in this case [0,1],(1,2],(2,4]... but not sure. Anyway the example from tom is perfect and i have made it work in the code – Steffen Hvid Aug 22 '17 at 11:56

1 Answers1

2

I think you want a base-2 logarithmic scale on the xaxis.

You can do that by setting ax.set_xscale('log', basex=2)

You also then need to adjust the tick locations and formatting, which you can do with ax.xaxis.set_major_locator(ticker.FixedLocator(bins)) and ax.xaxis.set_major_formatter(ticker.ScalarFormatter()

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np

fig, ax = plt.subplots(1)

# Some fake data
cnt = np.random.lognormal(0.5, 2.0, 6000)

# Define your bins
bins = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

# Plot the histogram
ax.hist(cnt, bins=bins)

# Set scale to base2 log
ax.set_xscale('log', basex=2)

# Set ticks and ticklabels using ticker
ax.xaxis.set_major_locator(ticker.FixedLocator(bins))
ax.xaxis.set_major_formatter(ticker.ScalarFormatter())

plt.show()

enter image description here

tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • Perfekt, just what i needed, not i just need to make a space between the bins, but i think i remember a built in function in the hist. rwidth i think – Steffen Hvid Aug 22 '17 at 11:50
  • @SteffenHvid adding space between the bins of a histogram does not make any sense, as detailed in the comments below the question. – ImportanceOfBeingErnest Aug 22 '17 at 11:59