gnuplot slow when plotting large data set as animation

Question

I'm trying to make an "animated" plot a lot of data (the position of 1000 particles) from a big text file with a script like:

set terminal wxt size 1000,600
k=999999
N = 999
do for [i=0:k]{
plot for [j=0:N-1] "pos.txt" using 2*j+1:2*j+2  every ::2*i+1::2*i+1 ls 1 pt 7 ps 2 notitle

In the file, every line has the coordinates X and Y at a certain time of the points I want to plot. I'm using every to plot all the data in each row once and then move on to the next row.

The output is something like this (1000 particles moving) enter image description here

However the plotting is way too slow and I don't know what I can do to make it plot faster. It plots a row once every 5 or more seconds. The file weights some MBs. Should I change the terminal? Or the way I store the data? I think there might be a problem when gnuplot loads a big file.
Some particles dissappear in the simulation so I also get the error line 14: warning: Skipping data file with no valid points when the index j (well 2j+1) goes over the number of particles but I tried making it so that it reads the number of particles each time and it's even slower. Many thanks.

If you are only interested in the end animation you could export the images to a gif or use something like ffmpeg to generate a video. Then the playback won't be limited by how fast gnuplot can process the data file so you would be able to view your animation at any desirable frame rate. — ilent2, Jun 16 '15 at 06:52
That's a good idea, however, since gnuplot takes ages to plot what I wanted with my script it would need hours to complete the animation, so even If I can watch the animation at full speed that's still a problem. — Nister, Jun 16 '15 at 17:38
Googlers may also be interested in this survey that I've done of different plotting software with a 10 million point count: https://stackoverflow.com/questions/5854515/large-plot-20-million-samples-gigabytes-of-data/55967461#55967461 — Ciro Santilli OurBigBook.com, May 03 '19 at 10:06

score 3 · Accepted Answer · answered Jun 16 '15 at 04:46

I suspect gnuplot is reading the whole file every time you plot, as opposite to read up to the line in question, then next line, then next, etc. One possible strategy is to separate your particles trajectory into different files, but specially it could help to remove the plot for by simply a plot plus a block selection with every, where instead of selecting the column for the particle you have your particles positions for the same time step in the same block.

Now your data looks something like this:

x1 y1 x2 y2 x3 y3 # Time step 1
x1 y1 x2 y2 x3 y3 # Time step 2

And gnuplot needs to read the file once for every time step and particle. If you structure the file as follows (note one blank line between blocks):

# Time step 1
x1 y1
x2 y2
x3 y3

# Time step 2
x1 y1
x2 y2
x3 y3

Then you don't need the plot for, instead just select the corresponding block with all the particles by inserting one extra semicolon in every:

set terminal wxt size 1000,600
k=999999
#N = 999 you don't need this anymore!
do for [i=0:k] {
plot "pos.txt" every :::i::i
}

The code above reads the file for every time step, rather than every time step and particle, and plots all the particles at once.

That worked out really well. However, how is gnuplot reading the data in the first script? Is it going to the line `2i+1` then plotting columns `2j+1` and `2j+2` then again to the line `2i+1` and so on? (So it has to look for the same line for every particle position until it plots all particles) — Nister, Jun 16 '15 at 17:34

score 3 · Answer 2 · answered Jun 16 '15 at 21:01

If performance is very critical, you may consider using a completely different data format. Although changing the format of the ASCII file gives a huge improvement, it scales badly, because gnuplot must always scan from the beginning of the data file in order to determine the position where to start at. I did some testing, and to plot the first 1000 frames it took me 60s, whereas the points 9000 to 10000 took 600s to plot.

You would need a data format which allows you to seek at any data set in constant time. In my thesis I saved all my experimental data (huge data sets) with hdf5, and then you can use the external utility h5totxt to extract the desired data set. Here, the position of the requested data set can be calculated without scanning the whole file, and the access time is independent of the frame number.

For testing I used the following python script to generate a test data file points.h5:

from numpy import random
import h5py
P = random.normal(size=(10000,1000,2))
f = h5py.File('points.h5', 'w')
f.create_dataset('points', data=P)

The gnuplot script for plotting is

set terminal wxt size 1000,600
k=9999
do for [i=0:9999]{
  plot sprintf("< h5totxt -s ' ' -x %d points.h5", i) using 1:2 ls 1 pt 7 ps 2 title sprintf("%d", i)
}

Now, plotting of 1000 frames takes 40s, no matter which frames you take (0-1000 or 9000-10000).

gnuplot slow when plotting large data set as animation

2 Answers2