0

I just started to learn how to use gnuplot. Currently I have a need to plot some histograms. I have a set of data which is very compact around the mean and has a long tail. The net result of this is having overlapping boxes/xtics where the data is compact. One of the initial ideas I had was to use variable sized bin widths to try and control spacing but this did not work well. In the end, I settled on generating plots based on different sigmas (eg 1σ, 2σ, 3σ, etc) which limits the range. For larger sigmas, I change the graph type to linespoints (although the overlapping xtics still occurs, it makes the graph a little more readable).

However, while doing this, it sparked my curiosity on the proper way to do variable width bins. Is this even possible?

Here is some example data. This is just a small subset and hence why there are noticeable gaps.

ITEMS   COUNT   SCALE
149     60      1
186     811     1
2986    180     2.21622
3069    189     2.21622
5827    45      3.13514
5940    37      3.13514
6189    34      4.2973
6346    32      4.2973

SCALE is supposed to be the bin width scale. Note that count is adjusted for the scale. The idea being that the wider width would be indicative that there is a scale being factored in the count.

The setup for the plot was pretty basic

set style data histogram
set style histogram cluster gap 1
set xtics rotate
plot '-' u 2:xtic(1) with boxes t '' lc rgb 'blue'

The part that I am still confused on is the entry values syntax for using. I've seen some other postings unfortunately none really have helped to alleviate my confusion.

How to create a histogram with varying bin widths

Histogram using gnuplot?

In the latter posting, the part that confuses me is this line:

plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle

I thought that using entries were for the X then Y values (X position and height of the box). So when I read the above line, it calls function hist to get the X value, but then Y is (1.0) and that part doesn't make sense to me.

Even my line for plotting confusing to me (although it appears to work), because for using I use 2:xtic(1). It isn't clear to me why 1:2 doesn't work properly.

Hopefully this question is understandable. I've only used gnuplot since this weekend, so it's fair to say I'm still in the learning curve phase.

Mobile Ben
  • 7,121
  • 1
  • 27
  • 43
  • 1
    See the explanation of `smooth frequency` at https://stackoverflow.com/a/16382426/2604213 – Christoph Oct 01 '18 at 09:33
  • @Christoph thanks, I took a look. I'll sample with it. – Mobile Ben Oct 03 '18 at 07:04
  • @Christoph okay, that was super helpful. I pre-binned my data. I'm actually using gnuplot-iostream. So fortunately I already had the data unbinned. It appears the `(1.0)` controls the y-axis. It looks like it behaves like a scale. Thanks so much! – Mobile Ben Oct 05 '18 at 00:11

0 Answers0