2

I have a camera that takes high resolution images and stores them as large matrices. I am trying to construct an image from the data. (And it must be done in Python-32 bit.)

The data is saved in HDF5 and I am using h5py to access it, but I am unable to plot the data without a memory error because all of the methods that I know require all of the data to be dumped into the computer's memory. (I am only familiar with the usual matplotlib and scipy libraries.)

Also, I have the same issue when I try to generate images from the data, but I have been told that GDAL would be able to generate an image from the data in a previous question I asked (Constructing high resolution images in Python).

I have done some research (it seems that GDAL for python is not very well documented) and came across this question: Can you loop through pixels in an image without loading the whole image?. The answer provided gives a quick script that imports an image row-by-row. Is there a way to do the opposite of this and save the image row-by-row? Then I would not have to load all of the data into the memory to save an image.

Or is there a method to generate a image (preferably a PNG) from a HDF5 dataset that is too large to load into the memory?

Here is some example code I have been working with:

import tables
import Image
import matplotlib.pyplot as plt
import scipy.misc

data = numpy.random.random_integers(0, 262143, (10000, 10000))

fileName = "array1.h5"
h5f = tables.openFile(fileName, "w")
array = h5f.createArray(h5f.root, "array1", data)
h5f.close()


fileName = "array1.h5"
h5f = tables.openFile(fileName, "r")
array_read = h5f.root.array1
print array_read[:]

#Method 1
scipy.misc.imsave('Test_random.png', array_read[:])

#Method 2
plt.imshow(array_read[:])
plt.show()

#Method 3
plt.pcolormesh(array_read[:])
plt.show()

It generates a 10000x10000 matrix and saves it in an H5 file with h5py. I close the file and reopen it. Then I try to save an image or plot the data (I comment out two of the three methods to test each one).

If someone could provide some example code that would allow me to save this array stored in an H5 file to a PNG image, I would greatly appreciate it.

Community
  • 1
  • 1
Mink
  • 438
  • 8
  • 14
  • Can I ask why you want to save your array as a `.png` if that file would be too big to load into memory anyway? What do you plan on doing with it? It seems to me that you might be better off figuring out how to downsample your array (or subregions of it) for plotting/saving as images. – ali_m Jun 11 '13 at 12:27
  • @ali_m, a JPG or some other some other compressed image file would be okay... But I would like to preserve the original image as much as possible. If there appears to be no real solution to my question, I will probably resort to downsampling my data and saving the image that way. But my goal is to output a high resolution image with zooming capabilities for external processing. – Mink Jun 11 '13 at 12:35
  • 2
    My thinking was that actually storing your raw pixel data in a `.hdf5` file makes sense, and what you ought to do is figure out how to nicely downsample subregions of this raw data for display/saving out as image files. I don't see any real advantage to saving your raw data as an image file because I'm not aware of any convenient way to view an image file that's too big to fit in memory. – ali_m Jun 11 '13 at 12:40
  • @ali_m, Okay. Your logic makes more sense than what I was previously thinking of doing. I guess it was probably my lack of knowledge about working with large datasets that made me think that generating high res. image files would be easier to work with. It felt natural to think that I should include all of the data. Now that seems unnecessary and difficult. Thank you. – Mink Jun 11 '13 at 12:50
  • Good luck - you might find [this thread](http://stackoverflow.com/questions/13242382/resampling-a-numpy-array-representing-an-image) helpful. – ali_m Jun 11 '13 at 13:02
  • This is also closely related to: http://stackoverflow.com/questions/16921997/generating-pcolormesh-images-from-very-large-data-sets-saved-in-h5-files-with-py – tacaswell Jun 11 '13 at 21:17
  • @tcaswell, that is one of my other questions on this subject. Unfortunately, there were no answers that proved helpful enough to solve my problem. Ultimately, I think it comes down to what ali_m mentioned here about downsizing the data. I will need to focus more on downsizing the image data rather than trying to generate a high resolution image from all of my data. – Mink Jun 12 '13 at 09:26
  • Why don't you just save the image data in chunks? Each chunk being a HDF5 file... or is this not feasible? – razvanc Oct 23 '13 at 11:27

1 Answers1

0

Many image format have severe restrictions, i.e. they are optimized for displayed 8-bit RGB(A) color data. If your camera has more significant bits you would need a different format. Apart from HDF5 (which is made for exactly this situation) I would only recommend TIFF as it supports many different pixel formats and as BigTIFF version even file sizes larger than 4GB. And TIFF is widespread.

Now you say that you are unable to plot the data without a memory error. This (as ali_m pointed out) is a problem independent of the file format. If you cannot store the whole 10000x10000 pixel image in memory, you must either display a downscaled version or you must only display a zoomed part and then allow scrolling to different parts while all the time loading slices of your file from disk (this will likely be not very fast).

HDF5 allows for fancy indexing, so you could easily downsample (by integers) parts of the image. The OS could then buffer the file access and hopefully speed things up a bit.

I'm not aware of any inbuilt downscaling ability in hdf5 but you could add that yourself (do it in chunks when running into memory problems) using scipy routines.

In HDF5 you can write data in chunks too (again via the mighty indexing), which is also important when saving the "data too big for memory".

TIFF now I'm not sure if libtiff which is used by most (all?) Python libraries with tiff capabilities can read or write slices of 2D images. But I know that you can put several images in one TIFF file (MultiPageTiff) in a similar way to HDF5. So you could put parts of your image into a TIFF file as several stacks. It's up to your application how you manage this best.

In the end I would strongly recommend using HDF5 as file format and using fancy indexing of HDF5, building a viewer around it that either zooms or downsamples or combines both to show you the data. For performance I would hope for the OS to buffer parts of the HDF5 file but there are some internal parameters (they use chunks too) that could maybe be tweaked to increase efficiency of your specific image size.

Tip: No need to show data (zoomed or not) with a pixelation better than the actual monitor.

Community
  • 1
  • 1
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104