I just started to learn how to use gnuplot. Currently I have a need to plot some histograms. I have a set of data which is very compact around the mean and has a long tail. The net result of this is having overlapping boxes/xtics where the data is compact. One of the initial ideas I had was to use variable sized bin widths to try and control spacing but this did not work well. In the end, I settled on generating plots based on different sigmas (eg 1σ, 2σ, 3σ, etc) which limits the range. For larger sigmas, I change the graph type to linespoints (although the overlapping xtics still occurs, it makes the graph a little more readable).
However, while doing this, it sparked my curiosity on the proper way to do variable width bins. Is this even possible?
Here is some example data. This is just a small subset and hence why there are noticeable gaps.
ITEMS COUNT SCALE
149 60 1
186 811 1
2986 180 2.21622
3069 189 2.21622
5827 45 3.13514
5940 37 3.13514
6189 34 4.2973
6346 32 4.2973
SCALE is supposed to be the bin width scale. Note that count is adjusted for the scale. The idea being that the wider width would be indicative that there is a scale being factored in the count.
The setup for the plot was pretty basic
set style data histogram
set style histogram cluster gap 1
set xtics rotate
plot '-' u 2:xtic(1) with boxes t '' lc rgb 'blue'
The part that I am still confused on is the entry values syntax for using
. I've seen some other postings unfortunately none really have helped to alleviate my confusion.
How to create a histogram with varying bin widths
In the latter posting, the part that confuses me is this line:
plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
I thought that using
entries were for the X then Y values (X position and height of the box). So when I read the above line, it calls function hist
to get the X value, but then Y is (1.0)
and that part doesn't make sense to me.
Even my line for plotting confusing to me (although it appears to work), because for using I use 2:xtic(1)
. It isn't clear to me why 1:2
doesn't work properly.
Hopefully this question is understandable. I've only used gnuplot since this weekend, so it's fair to say I'm still in the learning curve phase.