4

I am collecting a large amount of data that will be saved into individual H5 files using h5py. I would like to patch these images together into one pcolormesh plot to be saved as a single image.

A quick example I have been working on generates arrays of 2000x2000 random data points and saves them in H5 files using h5py. Then I try to import the data in these files and try to plot it in matplotlib as a pcolormesh, but I always run into a memoryError (which is expected).

import numpy
import h5py
arr = numpy.random.random((2000,2000))

with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_0.h5", "w") as f:
    dset = f.create_dataset("Plot_0", data = arr)

for i in range(1,100):
    arr = numpy.random.random((2000,2000))
    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(i) + ".h5", "w") as f:
        dset = f.create_dataset("Plot_" + str(i), data = arr)

This script generates my files. I picked 100 as an arbitrary number just to have a large enough set of files to pull from.

Then I import them using the following script:

y = numpy.arange(0, 2000, 1)

for display_plot_num in range(0, 5):
    print display_plot_num
    x = numpy.arange(display_plot_num*2000, display_plot_num*2000 + 2000, 1)

    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(display_plot_num) + ".h5", "r+") as f:
        data = f["Plot_" + str(display_plot_num)]
        plt.pcolormesh(x, y, data)
plt.show()

The range value in the for loop can be altered up until 100, but the maximum value I can choose without a memory error is 5 (i.e. 5 plots can be patched on a pcolormesh plot in matplotlib) and it is extremely clunky and slow. I need to be able to patch together many images.

Is there any other technique I should use to plot this data? Or it would be nice if I could just convert the data from multiple H5 files into an image without going through matplotlib or a similar program (like scipy).

In summary, my problem is this:

  • I have a large number of HDF5 files with image data (2000x2000)
  • I need to patch together these files into a single image and save it

Any help is appreciated. Also, I would be glad to answer any further questions about my problem.


Edit (5.6.2013):

I feel a similar question would be how to deal (import, manipulate, edit, etc.) with very high resolution images in Python. This is essentially what I am trying to do; generate a very high resolution image from a collection of smaller images.

Community
  • 1
  • 1
Mink
  • 438
  • 8
  • 14
  • 2
    Try using `imshow` with `interpolation="none"` instead of `pcolormesh`. You'll need to change the way you specify the location of the image (e.g. use the `extent` kwarg instead of passing in x and y), but it should be faster. – Joe Kington Jun 04 '13 at 15:56
  • 1
    can you down sample at all? – tacaswell Jun 04 '13 at 18:27
  • what dtype is data? can you get away with less precision? i doubt you eye will be able to tell the difference between a float32 and a float64 (or an int16) – matt Jun 04 '13 at 22:27
  • If you don't need it to be interactive, just use `cmap` to do the color mapping and then save directly to disk using `PIL` – tacaswell Jun 05 '13 at 04:16
  • http://stackoverflow.com/questions/14869321/is-there-a-way-to-convert-pyplot-imshow-object-to-numpy-array/14877059#14877059 – tacaswell Jun 05 '13 at 04:19
  • http://stackoverflow.com/a/8538444/380231 – tacaswell Jun 05 '13 at 04:21
  • Thanks for the comments and I am currently exploring if they help my application. I have to stress that I must be able to build images from huge data sets. For example, I may need to build a single image from all of the files my example script above creates. Importing all of these files is well beyond the memory limit of the computer. Is there a way to construct an image from these files without dumping all of the data into the memory? Or another idea is to construct multiple images and stitch them all together. I am not sure which would be a better approach. – Mink Jun 05 '13 at 10:24
  • If you came up with a nice solution, would be cool if you linked it here. thanks. – K.-Michael Aye Jun 05 '14 at 01:10

1 Answers1

2

One way to reduce the bloat of images in matplotlib (especially when saving to SVG) is to use the rasterized=True kwarg. This will essentially "flatten" your pcolormesh, which makes it much faster to save, uses less resources, etc.

choldgraf
  • 3,539
  • 4
  • 22
  • 27