1

I've got a GeoTIFF image that I need to make blurry by applying a smoothing filter. The image itself contains metadata that needs to be preserved. It has a bit-depth of 8 and uses a color table with 256 32-bit RGBA values to look up a color for each pixel, but in order for the resulting image to look smooth it will probably have to use a bit-depth of 24 or 32 and no color table, alternatively use jpeg compression. What may complicate this further is that the image is 23,899x18,330 pixels large, which is almost five times as large as the largest file PIL wants to open by default.

How can create the blurry version of this image in Python 3?

I have also tried using PIL to just open and save it again:

from PIL import Image
Image.MAX_IMAGE_PIXELS = 1000000000
im = Image.open(file_in)
im.save(file_out)

This code doesn't crash, and I get a new .tif file that is approximatelly as large as the original file, but when I try to open it in Windows Photo Viewer to look at it the application says it is corrupt, and it cannot be re-opened by PIL.

I have also tried using GDAL. When I try this code, I get an output image that is 835 MB large, which corresponds to an uncompressed image with a bit-depth of 16 (which is also what the file metadata says when I right-click on it and choose "Properties" – I'm using Windows 10). However, the resulting image is monochrome and very dark, and the colors look like they have been jumbled up, which makes me believe that the code I'm trying interprets the pixel values as intensity values and not as table keys.

So in order to make this method work, I need to figure out how to apply the color table (which is some sort of container for tuples, of type osgeo.gdal.ColorTable) to the raster band (whatever a raster band is), which is a numpy array with the shape (18330, 23899), to get a new numpy array with the shape (18330, 23899, 4) or (4, 18330, 23899) (don't know which is the correct shape), insert this back into the loaded image and remove the color table (or create a new one with the same metadata), and finally save the modified image with compression enabled (so I get closer to the original file size – 11.9 MB – rather than 835 MB which is the size of the file I get now). How can I do that?

HelloGoodbye
  • 3,624
  • 8
  • 42
  • 57
  • Can you try opening the image with GDAL in Python (as in the code you shared) and looking at the output when you print one of the raster bands? And also the output of the metadata when opened with GDAL. GDAL Python's documentation explains how you can do this – Notso Sep 07 '18 at 13:08

2 Answers2

3

pyvips can process huge images quickly using just a small amount of memory, and supports palette TIFF images.

Unfortunately it won't support the extra geotiff tags, since libtiff won't work on unknown tag types. You'd need to copy that metadata over in some other way.

Anyway, if you can do that, pyvips should work on your image. I tried this example:

import sys
import pyvips

# the 'sequential' hint tells libvips that we want to stream the image
# and don't need full random access to pixels ... in this mode, 
# libvips can read, process and write in parallel, and without needing
# to hold the whole image in memory
image = pyvips.Image.new_from_file(sys.argv[1], access='sequential')
image = image.gaussblur(2)
image.write_to_file(sys.argv[2])

On an image of the type and size you have, generating a JPEG-compressed TIFF:

$ tiffinfo x2.tif 
TIFF Directory at offset 0x1a1c65c6 (438068678)
  Image Width: 23899 Image Length: 18330
  Resolution: 45118.5, 45118.5 pixels/cm
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: palette color (RGB from colormap)
...
$ /usr/bin/time -f %M:%e python3 ~/try/blur.py x2.tif x3.tif[compression=jpeg]
137500:2.42

So 140MB of memory, 2.5 seconds. The output image looks correct and is 24mb, so not too far off yours.

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • Thanks! Works good, except that the metadata isn't tranferred, as you said. Isn't it possible to make `pyvips` keep all metadata, even though it doesn't understand it? – HelloGoodbye Sep 07 '18 at 16:47
  • Sadly not, it's a problem with libtiff -- it wasn't designed in such a way that it can copy fields it doesn't know about. GeoTIFF adds a whole set of extra fields which just look like gibberish to libtiff. We'd need to switch libvips over to the geotiff libraries, or perhaps add a new load/save pair, if geotiff and libtiff can co-exist in the same program. – jcupitt Sep 07 '18 at 17:27
  • You could also use pyvips to load the tiff and render the blurred image to a huge numpy array, then use gdal to load the original tiff, swap the pixels for the blurred ones from pyvips, and use gdal to write the image back. There's an example here showing how to do pyvips -> numpy https://github.com/jcupitt/pyvips/blob/master/examples/pil-numpy-pyvips.py – jcupitt Sep 07 '18 at 17:30
  • Following your latter suggestion, how do I swap the pixels for the blurred ones? I have posted a new question about that [here](https://stackoverflow.com/questions/52261722/python-3-how-to-change-image-data-in-gdal). – HelloGoodbye Sep 10 '18 at 16:10
  • Sorry :( I've never used gdal :( I assume you must be able to write new pixels to an image after load, but I don't exactly how. – jcupitt Sep 11 '18 at 06:49
0

A raster band is just the name given to each "layer" of the image, in your case they will be the red, green, blue, and alpha values. These are what you want to blur. You can open the image and save each band to a separate array by using data.GetRasterBand(i) to get the ith band (with 1-indexing, not 0-indexing) of the image you opened using GDAL.

You can then try and use SciPy's scipy.ndimage.gaussian_filter to achieve the blurring. You'll want to send it an array that is shape (x,y), so you'll have to do this for each raster band individually. You should be able to save your data as another GeoTIFF using GDAL.


If the colour table you are working with means that your data is stored in each raster band in some odd format that isn't just floats between 0 and 1 for each of R, G, B, and A, then consider using scipy.ndimage.generic_filter, although without knowing how your data is stored it's hard to give specifics on how you'd do this.

Notso
  • 115
  • 6