I am creating one figure with around one hundred subplots/axes, each with a few thousand data points. Currently, I am looping through each subplot and using plt.scatter
to place the points. However, this is quite slow. Is it possible to use multiple CPUs to speed up the plotting, by dividing the labor either one core per subplot or in terms of plotting the data points within a single subplot?
So far, I have attempted using joblib
to use parallel processes for the subplot creation, but rather than creating new subplots within the same figure, it spawns a new figure for each subplot. I have tried with the backends PDF
, Qt5Agg
, and Agg
. Here is a simplified example of my code.
import matplotlib as mpl
mpl.use('PDF')
import seaborn as sns
import matplotlib.pyplot as plt
from joblib import Parallel, delayed
def plotter(name, df, ax):
ax.scatter(df['petal_length'], df['sepal_length'])
iris = sns.load_dataset('iris')
fig, axes = plt.subplots(3,1)
Parallel(n_jobs=2)(delayed(plotter)
(species_name, species_df, ax)
for (species_name, species_df), ax in zip(iris.groupby('species'), axes.ravel()))
fig.savefig('test.pdf')
Setting n_jobs=1
works, all points are then plotted within the same figure. However, increasing it to above one creates four figures: one that I initiate with plt.subplots
and then one for each time ax.scatter
is called.
Since I am passing the axes from the first figure to plotter
, I am not sure how/why the additional figures are created. Is there some fallback in matplotlib, that causes new figures to be created automatically if the specified figure is "locked" by another plotting process?
Any advice on how to improve my current approach or achieve the speedups through alternative approaches are appreciated.