I managed a fairly nice example of variable width boxes last night. I was plotting latency histogram data produced by the FIO storage performance test package. With my compile options I have 1856 bins, that go as follows:
My latency values at plot time are in microseconds (FIO provides nanoseconds, but I wanted microseconds for historical reasons). I did not have the opportunity to include the bin widths in my data. So I did this:
- f(x) = (2**(int(log(x*1000)/log(2))-6))/1100
- plot "temp" u 1:2:(f(column(1))) with boxes fs transparent solid 0.7 noborder title "$legend"$base_plot
The f(x) definition returns the box width for a given latency - it works as follows:
- First, x*1000 gets me back to nanoseconds.
- log(x*1000)/log(2) takes the base 2 logarithm of the nanosecond count.
- The int() just gives me the integer part of that. Note that now for, say, 128 ns, I'd have 7.
- The -6 gets me to the base 2 log of the bin width.
- The 2 ** gets me to the bin width.
- The /1000 returns me from nanoseconds to microseconds.
Then I just use f(latency) in the plot command as the box width.
This works - it seems to work perfectly as far as I can tell. It would not give the right result for x < 64 ns, but I don't have any data that small, so it works out. A conditional expression could be used to patch it up for that part of the range.
I think the key observations here are that a) you don't have to have the width as literal data - if you can calculate it from the data you do have, you're golden, and b) column(n) is an alternative to $n as a way of expressing column values in the plot command. In my case I have all this in a bash script, and bash intercepted the $1.