9

I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.

What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.

Some code:

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()

Some background information

I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:

In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.

Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.

For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.

To avoid aliasing the plot is oversampled with a factor of two.

In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.

The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.

Francesco Montesano
  • 8,485
  • 2
  • 40
  • 64
Luke
  • 109
  • 1
  • 3
  • Could you implement such a decimation algorithm outside of `matplotlib` rendering, just updating the data to be displayed upon zooming event? – Joël Dec 12 '13 at 14:54
  • Possibly of interest here: [How can I subsample an array according to its density?](https://stackoverflow.com/questions/53543782/how-can-i-subsample-an-array-according-to-its-density-remove-frequent-values) – ImportanceOfBeingErnest Dec 14 '18 at 11:19

2 Answers2

1

It seems like you just need to decimate the data before you plot it

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()
Chris Flesher
  • 987
  • 1
  • 10
  • 13
0

Decimation is not best for example if you decimate sparse data it might all appear as zeros.

The decimation has to be smart such that each LCD horizontal pixel is plotted with the min and the max of the data between decimation points. Then as you zoom in you see more an more detail.

With zooming this can not be done easy outside matplotlib and thus is better to handle internally.