Large grouped data plotting

Question

I have a large amount of data to plot, and I'm trying to use gnuplot. The data is a sorted array of around 80000 elements. By simply using

plot "myData.txt" using 1:2 with linespoints linetype 1 pointtype 1

I get the output, but: it takes time to render, and the points are often cluttered, with occasional gaps. To address the second, I thought of doing the bar chart: each of the entries would correspond to a bar. However, I'm not sure how to achieve this. I would like to have some space between consecutive bars, but I don't expect that it would be visible. What would be your suggestion to plot the data?

........................

Due to large data volume, I guess it's best to group. Note that my data looks like

I would like to plot the data by a groups of 3, ie., the first vertical line should start at 9521.07 as a minimum of the points from 1, 2, 3, and end at 11041. The second vertical line should consider the following 3 points: 4, 5 and 6, and start at 9519.07 with an end at 9521.07, and so on.

Could this be achieved with gnuplot, given the data file as illustrated? If so, I would appreciate if someone posts a set of commands I should use.

andyras · Accepted Answer · 2012-05-02T14:09:19.170

To reduce the number of points gnuplot actually draws, you can use the every keyword, e.g.

plot "myData.txt" using 1:2 with linespoints linetype 1 pointtype 1 every 100

will plot every 100th data point.

I am not sure if it's possible to do what you want (plotting vertical lines) elegantly within gnuplot, but here is my solution (assuming a UNIX-y environment). First make an awk script called sort.awk:

BEGIN { RS = "" }
{
 # the next two lines handle the case where
 # there are not three lines in a record
 xval = $1 + 1
 ymin = ymax = $2
 # find y minimum
 if ($2 <= $4 && $2 <= $6)
  ymin=$2
 else if ($4 <= $2 && $4 <= $6 && $4 != "")
  ymin=$4
 else if ($6 <= $2 && $6 <= $4 && $6 != "")
  ymin=$6
 # find y maximum
 if ($2 >= $4 && $2 >= $6)
  ymax=$2
 else if ($4 >= $2 && $4 >= $6)
  ymax=$4
 else if ($6 >= $2 && $6 >= $4)
  ymax=$6
 # print the formatted line
 print ($1+1) " " ymin " " ymin " " ymax " " ymax
}

Now this gnuplot script will call it:

set terminal postscript enhanced color
set output 'plot.eps'

set boxwidth 3
set style fill solid
plot "<sed 'n;n;G;' myData.txt | awk -f sort.awk" with candlesticks title 'pretty data'

It's not pretty but it works. sed adds a blank line every 3 lines, and awk formats the output for the candlesticks style. You can also try embedding the awk script in the gnuplot script.

Thanks. I decided to group the data, as can be observed from my question edit. I hope you'll have some useful comments. — user506901, May 02 '12 at 10:59

score 1 · Answer 2 · edited May 23 '17 at 10:24

You can do something like that...(it'll be easiest on unix). You will need to insert a space every third line -- I don't see any way around that. If you're on unix, the command

awk 'NR % 3 == 0 {print ""} 1' myfile

should do it. ( see How do I insert a blank line every n lines using awk? )

Of course, you could (and probably should) pack that straight into your gnuplot file.

So, all said and done, you'd have something like this:

xval(x)=int(x)/3  #Return the x position on the plot
plot "< awk 'NR % 3 == 0 {print ""} 1' datafile" using (xval($1)):2 with lines

Large grouped data plotting

2 Answers2