0

I'm trying to create a subplot of several pcolormesh graphs that would be like this one:

enter image description here

I achieve to do this by computing on one process:

import matplotlib.pyplot as plt
import numpy as np

def create_spectrogram(data, x, y, ax):
    ax.pcolormesh(x, y, data)


def do_it_simple():
    size = 10
    data = np.arange(size * size).reshape((size, size))

    y = np.arange(0, 10)
    x = np.arange(0, 10)

    fig, axes = plt.subplots(nrows=2, ncols=5)
    axes_list = [item for sublist in axes for item in sublist]

    for ax in axes_list:
        create_spectrogram(data, x, y, ax=ax)

    plt.savefig("test_simple.png")

def main():
    do_it_simple()               

Assuming my data array is really big, there are two very slow steps:

  • ax.pcolormesh(x, y, data)
  • plt.savefig()

I was wondering if I could use multiprocessing to parallelize the pcolormesh process. Now I have:

import matplotlib.pyplot as plt
import numpy as np
from multiprocessing import Pool
from functools import partial

def multiprocesser_handler(data_set, ax):
    (data, x, y) = data_set
    ax.pcolormesh(x, y, data)
    return ax


def do_it_multiprocessed():
    size = 10
    data = np.arange(size * size).reshape((size, size))

    y = np.arange(0, 10)
    x = np.arange(0, 10)

    fig, axes = plt.subplots(nrows=2, ncols=5)
    axes_list = [item for sublist in axes for item in sublist]

    data_set = (data, x, y)

    func = partial(multiprocesser_handler, data_set)

    with Pool(3) as p:
        axes_list = p.map(func, axes_list)

    print("DEBUG1:", axes_list)
    for ax in axes_list:
        print("DEBUG2:",ax.get_children())

    plt.savefig("test_multiprocess.png")

def main():
    do_it_multiprocessed()

The results are:

DEBUG1 [<matplotlib.axes._subplots.AxesSubplot object at 0x7fa6335fc2e8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fa633524080>, ....

DEBUG2 [<matplotlib.collections.QuadMesh object at 0x7fa63129c278>, <matplotlib.spines.Spine object at 0x7fa6312f8dd8>, <matplotlib.spines.Spine object at 0x7fa63130d4e0>, <matplotlib.spines.Spine object at 0x7fa63130d5f8>, <matplotlib.spines.Spine object at 0x7fa631295fd0>, <matplotlib.axis.XAxis object at 0x7fa63130d668>, <matplotlib.axis.YAxis object at 0x7fa6312f8e48>, Text(0.5, 1.0, '2005'), Text(0.0, 1.0, ''), Text(1.0, 1.0, ''), <matplotlib.patches.Rectangle object at 0x7fa63129cb38>]
DEBUG2 [<matplotlib.collections.QuadMesh object at 0x7fa6334fdb70>, <matplotlib.spines.Spine object at 0x7fa633260438>, <matplotlib.spines.Spine object at 0x7fa63336f128>, <matplotlib.spines.Spine object at 0x7fa63336f1d0>, <matplotlib.spines.Spine object at 0x7fa6334fd908>, <matplotlib.axis.XAxis object at 0x7fa63336f550>, <matplotlib.axis.YAxis object at 0x7fa633260b00>, Text(0.5, 1.0, '2006'), Text(0.0, 1.0, ''), Text(1.0, 1.0, ''), <matplotlib.patches.Rectangle object at 0x7fa63352f470>]
...

It seems like I achieved to parallelize correctly the pcolormesh computation process (the result contains matplotlib.collections.QuadMesh objects). But when I open the result figure I get this:

enter image description here

I guess that I do not merge all the AxesSubplot correctly, any idea to do this ?

I also tried to use imshow() that is known to be more efficient than pcolormesh, but I had some issues that are described here: Change y log scale imshow()

Thanks in advance !

n0n0bstan
  • 1,790
  • 4
  • 15
  • 26
  • matplotlib is pretty slow and not sure if you'd be able to do much with multiprocessing. maybe you could transform the data so it fits on a regular grid then you'd be able to use [`imshow`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html) which is **much** faster – Sam Mason Jul 30 '19 at 11:16
  • Hi @SamMason ! Thx for your suggestion ! Sadly, I have not been able to use imshow in this case: the log scale does not suit my needs – n0n0bstan Jul 30 '19 at 11:30
  • hence the suggestion to "transform your data"... maybe you could post some code to make your `x`, `y` and `data` variables (i.e. probably using `logspace`, `linspace`, and `random` from numpy) you'd get more directly useful answers – Sam Mason Jul 30 '19 at 11:38
  • 1
    I just explained my imshow issue by creating the following post: https://stackoverflow.com/questions/57271587/how-to-change-y-log-scale-imshow – n0n0bstan Jul 30 '19 at 12:27
  • In the commments of https://stackoverflow.com/questions/57271587/change-y-log-scale-imshow you proposed to resample the data: I do not want to do that because it'll ask a lot of code efforts, it can generate mistakes that I do not want, cutting the data to apply several FFT may create side effects when merging, I have access to a cluster, my purpose is to parallelize – n0n0bstan Jul 30 '19 at 13:56
  • I meant the `data` variable that's passed to `ax.pcolormesh`. I presume this is the output of the FFT, so no need to do multiple FFTs. you shouldn't need to parallelise anything, this isn't much data to push around – Sam Mason Jul 30 '19 at 14:22
  • Oh sorry I didn't understand correctly ! Do you mean cutting the data before passing them to colormesh(), getting several QuadMesh objects and merging them after all ? I did know we could process this way... do you have a link to a tutorial ? – n0n0bstan Jul 30 '19 at 14:42
  • just discard most of the data (or get the mean). displays aren't 15k pixels wide, and if you're displaying 4 plots beside each other you certainly don't need to plot all the data. if you're displaying at 300dpi that would be over five meters wide, so something is almost certainly resampling your image somewhere – Sam Mason Jul 30 '19 at 16:01
  • 1
    If you can provide a [mcve] (i.e. runnable code) for both the working and non-working case, it would all be a lot easier. (For that matter, consider that the problem you want help with does not depend on large pickled files!) – ImportanceOfBeingErnest Jul 30 '19 at 21:18
  • You're right, I just edited my question to get the code reproducible – n0n0bstan Jul 31 '19 at 13:38
  • If you put `print(fig == ax.figure)` into the loop it will print all `False`, meaning, each process has its own figure. You could save each processes' figure via `ax.figure.savefig(...)` inside the loop. – ImportanceOfBeingErnest Aug 19 '19 at 09:25
  • I may recommend [this question](https://stackoverflow.com/questions/15857838/modify-object-in-python-multiprocessing). So one might conclude that multiprocessing is not well suited for the problem at hand. – ImportanceOfBeingErnest Aug 19 '19 at 09:41

0 Answers0